Reputation: 173
Is there a way to filter already processed dataset for only English language text using Python? Maybe some NLTK features or something like that. The data was extracted from Twitter, and it's format is the following:
<tweetid>, <username>, <userid> &8888 <tweet text>
Stream filtering is not appropriate, since I have the initial data only in the format showed above. Any help will be appreciated, thanks.
Upvotes: 1
Views: 1946
Reputation: 22654
What you need is the language detection module.
from textblob import TextBlob
textBlob('your tweet').detect_language()
Upvotes: 2