Reputation: 969
I'm aware that there are many different methods like BLEU, NIST, METEOR etc. They all have their pros and cons, and their effectiveness differs from corpus to corpus. I'm interested in real-time translation, so that two people could have a conversation by typing out a couple sentences at a time and having it immediately translated.
What kind of corpus would this count as? Would the text be considered too short for proper evaluation by most conventional methods? Would the fact that the speaker is constantly switching make the context more difficult?
Upvotes: 0
Views: 165
Reputation: 1
Your corpus would be a chat or a type of question and answering. If you have many sentence suggestions available, then you could try https://gitlab.com/Bachstelze/translation-metric/tree/master/ It is a vector space model approach on the sentence level, so you don't have to learn a language specific system and the switching between the speakers shouldn't be a problem as long as the sentences don't get too short.
Upvotes: 0
Reputation: 1229
What you are asking for, belongs to the domain of Confidence Estimation, nowadays (within the Machine Translation (MT) community) better known as Quality Estimation, i.e. "assigning a score to MT output without access to a reference translation".
For MT evaluation (using BLEU, NIST or METEOR) you need:
In your case (real-time translation), you do not have (2). So you will have to estimate the performance of your system, based on features of your source sentence and your hypothesis translation, and on the knowledge you have about the MT process.
A baseline system with 17 features is described in:
Quality Estimation is an active research topic. The most recent advances can be followed on the websites of the WMT Conferences. Look for the Quality Estimation shared tasks, for example http://www.statmt.org/wmt17/quality-estimation-task.html
Upvotes: 1