Tony Z
Tony Z

Reputation: 31

How to use Google Cloud translator for HTML text and preserve the line breaks?

I'm using Google Cloud Translator API to translate some HTML texts. I set the format to HTML, and the translation qualities are pretty good (it keeps all the tags untranslated and only translated the text between the tags). However, it often removes all the line breaks in the HTML text. For example, I selected the English-German option, and

<p><a class="selfLink" id="notes" href="#notes" rel="help"><strong>Notes</strong></a>
<ul>
<li><a class="selfLink" id="disclaimer" href="#disclaimer" rel="help">DISCLAIMER OF LIABILITY</a> 
...

becomes

<p><a class="selfLink" id="notes" href="#notes" rel="help"><strong>Anmerkungen</strong></a><ul><li> <a class="selfLink" id="disclaimer" href="#disclaimer" rel="help">...

It's very difficult to read the translated text since it's all in one line. I know that I can set the translator mode to treat the input text as "text" to preserve the line breaks, but in text mode, the translator is not able to identify HTML entities and determine whether a piece of text should be translated or not. Manually adding the line breaks is not a desirable approach. What can I do to improve the readability of the HTML translation?

Upvotes: 2

Views: 1766

Answers (1)

savenkov
savenkov

Reputation: 668

Disappearing newlines is one of the features of the HTML mode, another is that some of the Unicode characters will turn into HTML entities. You will run into it sooner or later :-)

The solution is to replace all newlines with <br/> before sending the text to Google Translate API, and after getting the translation replace <br/> with newlines + making HTML decode.

Upvotes: 3

Related Questions