mjackie
mjackie

Reputation: 173

Twitter dataset filtering for only English language text using Python

Is there a way to filter already processed dataset for only English language text using Python? Maybe some NLTK features or something like that. The data was extracted from Twitter, and it's format is the following:

<tweetid>, <username>, <userid> &8888 <tweet text>

Stream filtering is not appropriate, since I have the initial data only in the format showed above. Any help will be appreciated, thanks.

Upvotes: 1

Views: 1946

Answers (1)

aerin
aerin

Reputation: 22654

What you need is the language detection module.

from textblob import TextBlob    
textBlob('your tweet').detect_language()

Upvotes: 2

Related Questions