Reputation: 51
I am a starter of machine learning and I wonder if it's possible to apply machine learning to the following case.
Image I am passing a mixed language string (english + anything else) to the machine learning library, and I expect the library tells me if this string has been fully translated from english to the target language or not. For example
Example 1:
Example 2:
input:
"请upload您的文件" # (please upload your file in Chinese)
expected result:
Needs future translation (to Chinese) as "upload" is an action which should be translated.
If machine learning could apply to this, then how should I pick the dimension of the input string and which algorithm should I pick (logistic regression or neural network? )
Thanks
Upvotes: 1
Views: 66
Reputation: 1423
Natural language processing is a large and diverse field. You can think about your example a number of ways.
The first is character sets and symbol encoding. Most non-romance languages will have characters outside the standard 26 letter alphabet. If you see characters from inside and outside the core character ranges for a language, it works around needing a lot of dictionaries.
The second is to look at a set of examples or words in a certain language and use Naive Bayes classification to associate words with languages in some training set.
You may be able to go further doing stem detection and more but I haven't studied them well enough. Consider posting on Crossvalidated.
Upvotes: 1