Reputation: 3
I am trying to translate on Microsoft Cognitive Service text with XML-like tags:
<LABEL0>John</LABEL0> <LABEL1>Smith</LABEL1> is reading a <LABEL2>blue book</LABEL2>.
I am wondering if this is something that the NMT service would be able to handle and conserve tags in the translation?
Thanks
Upvotes: 0
Views: 283
Reputation: 389
You can preserve your XML tags by specifying a textType parameter of "html" on the request. See here for the spec.
https://dev.microsofttranslator.com/translate?api-version=3.0&from=en&to=fr&category=generalnn&textType=html
For example, your sentence with textType set to html, translated to French produces
<LABEL0>John</LABEL0> <LABEL1>Smith</LABEL1> lit un <LABEL2>livre bleu</LABEL2>
Upvotes: 1
Reputation: 1683
I reproduced the complete problem and worked. Check out the procedure once.
Create source and destination folders and in source upload the XML file.
Code:
pip install azure-ai-translation-document==1.0.0
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient
key = "your key"
endpoint = "your endpoint"
sourceUrl = "source XML URL"
targetUrl = "target URL"
client = DocumentTranslationClient(endpoint, AzureKeyCredential(key))
poller = client.begin_translation(sourceUrl, targetUrl)
result = poller.result()
print("Status: {}".format(poller.status()))
print("Created on: {}".format(poller.details.created_on))
print("Last updated on: {}".format(poller.details.last_updated_on))
print("Total number of translations on documents: {}".format(poller.details.documents_total_count))
print("Of total documents...")
print("{} failed".format(poller.details.documents_failed_count))
print("{} succeeded".format(poller.details.documents_succeeded_count))
for document in result:
print("Document ID: {}".format(document.id))
print("Document status: {}".format(document.status))
if document.status == "Succeeded":
print("Source document location: {}".format(document.source_document_url))
print("Translated document location: {}".format(document.translated_document_url))
print("Translated to language: {}\n".format(document.translated_to))
else:
print("Error Code: {}, Message: {}\n".format(document.error.code, document.error.message))
Then we can see the translated document in the target folder in the container.
Upvotes: 0