Metrics or evaluation about machine learning translation

Question

Could you guys recommend some evaluation or metrics about machine learning for translation: for example Japanese to English et al. If possible, could you tell me some papers about metrics. I am a new one to translation. Thanks!

Jiageng · Accepted Answer

Despite continuous critics and debates starting with this 2006 article, BLEU (BiLingual Evaluation Understudy) score is still the most commonly used metric for machine translation. According to the Wikipedia page,

BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation, the better it is" – this is the central idea behind BLEU. BLEU was one of the first metrics to achieve a high correlation with human judgements of quality, and remains one of the most popular automated and inexpensive metrics.

More specifically, if you want to look at Japanese-English translation, there was a class project from Stanford CS 224d that translates simple Japanese sentences like 「彼女は敵だった」 into English with neural network techniques, and uses BLEU as the evaluation metric.

If you want more readings on machine translation, I suggest a good start with one of the most influential one recently, namely Neural machine translation by jointly learning to align and translate by Yoshua Bengio et al. You can also look at the top papers that cited the BLEU critics to get a sense of other commonly used metrics.

Metrics or evaluation about machine learning translation

Answers (1)

Related Questions