Reputation: 37904
I installed (I am in Windows 7, but I am using a virtualenv
with Python 2.7.5):
pip install pyenchant
pip install 3to2
pip install https://bitbucket.org/spirit/guess_language/downloads/guess_language-spirit-0.5.tar.bz2
and did:
>>> from guess_language import guess_language
>>> guess_language("Hello World")
u'UNKNOWN'
Why am I getting u'UNKNOWN'
?
Upvotes: 1
Views: 525
Reputation: 57670
I suggest you use nltk for this. it'll be much easier in nltk.
import nltk
STOPWORDS_DICT = {lang: set(nltk.corpus.stopwords.words(lang))
for lang in nltk.corpus.stopwords.fileids()}
def get_language(text):
words = set(nltk.wordpunct_tokenize(text.lower()))
return max(((lang, len(words & stopwords))
for lang, stopwords in STOPWORDS_DICT.items()),
key = lambda x: x[1])[0]
Now see the code in action.
In [28]: get_language('hello world')
Out[28]: 'swedish'
In [30]: get_language('stackoverflow is a nice website')
Out[30]: 'english'
problem is if the sample text is very small it'll give wrong result.
The code is taken from this site.
Upvotes: 2