Can I feed translations back to Google Cloud Translation API to train it?

I am using the Google Translate neural network (amazing improvement) via the Google Cloud Translation API in SDL Trados to process technical translations.

Of course it needs heavy post-editing, mostly terminology and sometimes style. I would really like if the neural network could learn from this post editing - but there seems to be no way to do feed my edits back. It is possible when using the web interface manually (translate.google.com). The (years unupdated) Google Translator Toolkit allowed to used a shared public TM, but that is now obsolete with the neural network.

Can I somehow feed translations back to Google Cloud Translation API to train it?

Their FAQ states this:

"Does Google use my data for training purposes?

No, Google does not use the content you translate to train and improve our machine translation engine. In order to improve the quality of machine translation, Google needs parallel text - the content along with the human translation of that content."

Upvotes: 0

Answers (3)

Cecilia

Reputation: 11

Are you post-editing with a CAT tool?

There are a number of MT APIs that can be connected to CAT tools and TMSes, where you can batch-translate a single file or a group of files, or alternatively single segments, and do the post-editing there. Your post-editions are then uploaded to a TM.

I'm thinking that perhaps Fuzzy Repair (fixing TM results differences with MT) can help you. It's not exactly feeding translations back to the MT engine, but working the translation tool end (and fixes the confidentiality issue AFAIK).

I have tested Trados' and MemoQ's fuzzy repair features, and they work quite well! And these tools support different MT APIs, which can also help with customization (fine-tuning an MT model + fuzzy repair for real-time TM fixes).

Hope it helps!

Upvotes: 0

Adam Bittlingmayer

Reputation: 1277

You are looking for adaptive machine translation.

An adaptive machine translation system learns from human feedback and adapts its output on the fly. Adaptive machine translation is applicable to post-editing workflows.

In adaptive machine translation, the system is customised while the human post-editor fixes the machine translation output, instead of after batch retraining.

Adaptive machine translation is an example of online machine learning and human-in-the-loop (HITL).

The Google Cloud Translation API does not support adaptive customization, but a few machine translation APIs do:

Amazon Translate

KantanMT

Language Weaver

Lilt

Mirai Translator

ModernMT

NpatMT

Omniscien Technologies

PangeaMT

Phrase NextMT

Sunda Translator

SYSTRAN

Tilde

Generally ModernMT is be the easiest to get started with, followed by Amazon. In fact, they're easier than Google Cloud Translate, the main downside is that support fewer languages.

Lilt was the pioneer of this approach, but the machine translation APIs like Lilt, KantanMT or Language Weaver are tightly coupled to other technology or human translation service offerings.

Upvotes: 0

dsesto

Reputation: 8178

As you pointed out, in the documentation regarding confidentiality, it is highlighted that Google does not use the data for training purposes as a background/transparent process, due to the following reasons:

Confidentiality: for confidentiality reasons, the content inputted to the Translation API will not be used for training the model.
Non-feasibility: the Neural Network model behind Translation API would require the non-translated content plus the translated version suggested by the user in order add some training to the model; so it is not possible to train the model with just the non-translated text.

Moreover, there is currently not the possibility to suggest translations to the API in order to train the model in a more custom way.

As a side note, you might be interested in keeping an eye on AutoML, the new Google Cloud Platform's product that is currently still in alpha, but to which you can request access by filling in the form in the main page. It will allow the creation of custom Machine Learning models without requiring the dedication and expertise that other more complex products such as ML Engine require. The first product of the AutoML family to be launched will be AutoML Vision, but it is possible that similar products will appear for some of the other ML-related APIs in the Platform, such as the Translation API, which is the one you are interested in.

Also feel free to visit the Google Cloud Big Data and Machine Learning Blog from time to time in order to keep updated in the latest news in this field. If you are interested in AutoML, its release and presentation will probably have an article in the blog too.

So as a summary: no, currently you cannot feed suggested translations back to the Translation API, but in the future you might be able to do so, or at least have your own custom models.

Upvotes: 1

Can I feed translations back to Google Cloud Translation API to train it?

Answers (3)

Related Questions