user2966813
user2966813

Reputation: 51

apply machine learning to analysis mixed language

I am a starter of machine learning and I wonder if it's possible to apply machine learning to the following case.

Image I am passing a mixed language string (english + anything else) to the machine learning library, and I expect the library tells me if this string has been fully translated from english to the target language or not. For example

Example 1:

Example 2:

If machine learning could apply to this, then how should I pick the dimension of the input string and which algorithm should I pick (logistic regression or neural network? )

Thanks

Upvotes: 1

Views: 66

Answers (1)

mobiusklein
mobiusklein

Reputation: 1423

Natural language processing is a large and diverse field. You can think about your example a number of ways.

The first is character sets and symbol encoding. Most non-romance languages will have characters outside the standard 26 letter alphabet. If you see characters from inside and outside the core character ranges for a language, it works around needing a lot of dictionaries.

The second is to look at a set of examples or words in a certain language and use Naive Bayes classification to associate words with languages in some training set.

You may be able to go further doing stem detection and more but I haven't studied them well enough. Consider posting on Crossvalidated.

Upvotes: 1

Related Questions