PyTextCat
PyTextCat guesses the language of a given input text from over 70 different languages.
It is an implementation of the classification technique described by William B. Cavnar & John M. Trenkle (1994) in N-Gram-Based Text Categorization, and is based upon Gertjan van Noord's Perl implementation.
Both a Python library and a command-line interface are provided.
PyTextCat is released under GPLv2 (see COPYING). The lm files and test texts are from TextCat, and are therefore licensed under LGPLv2.1 (see COPYING.LGPL). Source code is available for download at GitHub.
Demo
Note that text containing unicode characters probably isn't handled properly at the moment in this online demo.
Page last updated: 2009-11-04 | View page source





