Reputation: 4325
What is the best way to determine the language of twitter posts.
There is the language parameter that comes with the streaming API but it doesn't really seem to be very accurate. Even many Japanese posts are labelled as English.
What have others done to sort out the langauges?
Upvotes: 1
Views: 308
Reputation: 1356
I've had very good results with this PHP package: http://pear.php.net/package/Text_LanguageDetect/
It is fast and open source. We use it to select English only posts for a site we run at http://2012twit.com.
Upvotes: 2
Reputation: 2468
google have language detection within their Translate API if using evil external services is a go-er?
http://code.google.com/apis/language/translate/v1/reference.html#detectResult
Upvotes: 1