Reputation: 394
I'm using the Microsoft Translator Text API to translate parts of a webpage. The platform we use, inserts
in the HTML to render empty lines. So a part of the webpage can be:
<p>
<span>This is a dummy text</span>
</p>
<p>
<span> </span>
</p>
When I send this to the Microsoft Translator Text API, it returns the following HTML:
<p>
<span>Il s’agit d’un texte factice</span>
</p>
<p>
<span></span>
</p>
I've set the content type to text/html, and escape the HTML characters to be able to send it to the API (so
will be replaced with &nbsp;
). But the text that is returned by the API has completely lost the
.
How can I prevent the API from removing the
instances in the HTML? Or is this a bug in the API?
Upvotes: 2
Views: 1303
Reputation: 663
See the answer to Microsoft Translator API - notranslate trimming leading space? from Chis Wendt (Microsoft):
Translator trims leading and trailing space, and compresses any other white space to a single space. This is by design. Translator needs to move the words around freely to form the newly composed sentence, and wouldn't know what to do with the extra white space. A workaround would be to trim in your code before translation, and then restore the trimmed off pieces afterwards, depending on the context.
Line breaks and non-breaking spaces tend to be used for specific line layout based on the particular source text that would need to be laid out differently in another language in any case because of different word lengths and arrangements of the significant words.
Upvotes: 1
Reputation: 278
A notranslate span may help to prevent translation. You would have to try it to see if it does indeed preserve the nbsp tag.
Upvotes: 1