Reputation: 1635
Is there a library where I can simple call a method on a string to find out if it is non-English? I'm trying to only save English strings and the incoming stream of strings has plenty of non-English in them.
Upvotes: 0
Views: 1841
Reputation: 27911
You can try to use linguo.
"your string".lang
# will return "en" for english strings
Disclaimer: I'm the creator of this gem.
Upvotes: 3
Reputation: 1032
You can use GoogleTranslate API with the RailsBridge for it - http://code.google.com/apis/gdata/articles/gdata_on_rails.html
Upvotes: 1
Reputation: 40277
Not that I'm aware... but you could get this list into an array (http://www.langmaker.com/wordlist/basiclex.htm) and then match the string's words against it... Decide on some percentage as good, and go from there.
You could even use bayesian algorithm here to mark those words as "good" and learn from there, but that might be overkill.
Upvotes: 0