Jonatan
Jonatan

Reputation: 3992

I18N of XML documents

I'm about to decide how to handle internationalisation of an XML-based format for UI description.

The format typically looks something like this:

...
<devif>
  <screen id="scr1" title="Settings for this and that">
    <header text="Climate readings"/>
    <rd setp="123" text="Air temperature" unit="°C"/>
    <rd setp="234" text="Humidity" unit="%RH"/>
    <rd setp="345" text="CO2" unit="ppm"/>

    <header text="Settings"/>
    <wr setp="567" text="Air temperature demand" unit="°C"/>
  </screen>
  ...
</devif>

Each file contains lots of screens and can be up to some 10.000 lines, and we have a dozen of these files in our application.

I can still change the format to best suit our needs. So how would you go about translating this?

I've been thinking about some possible ways to handle this:

The first solution has the problem where the english text might be translated into different messages depending on context.

The second solution makes the source file less readable (although not by much), and it does not handle translation of attributes easily.

The third solution would make the file very large and cumbersome to work with once the file has been translated into some 5-6 languages.

Upvotes: 5

Views: 2538

Answers (3)

Fernando Migu&#233;lez
Fernando Migu&#233;lez

Reputation: 11326

We use standard TMX files, that are standard XML files to hold internationalized literals. Each entry is identified by a label, which is reference all around the code. Every entry has all the possible translations and the most important part is that TMX is a standard used by translation programs and professionals.

If you already have XML files to hold your literals you can convert them by means of a XSLT stylesheet.

Here is an example of the format:

<?xml version="1.0" encoding="UTF-8"?>
<tmx>
    <body>
        <tu tuid="$ALARM_BARCODE_READER_COMMS">
            <tuv lang="ES">
                <seg>Lector códigos de barras: No Operativo</seg>
            </tuv>
            <tuv lang="EN">
                <seg>Barcode reader: Not operative</seg>
            </tuv>
            <tuv lang="ZH">
                <seg>读卡器:通讯错误</seg>
            </tuv>
        </tu>
        <tu tuid="$ALARM_BARCODE_READER_FAIL">
            <tuv lang="ES">
                <seg>Lector códigos de barras: Fallo</seg>
            </tuv>
            <tuv lang="EN">
                <seg>Barcode Reader: Fail</seg>
            </tuv>
            <tuv lang="ZH">
                <seg>读卡器:故障</seg>
            </tuv>
        </tu>
        <tu tuid="$NO_PAYMENT_MODE_AVAILABLE">
            <tuv lang="ES">
                <seg>No hay sistemas de pago disponibles</seg>
            </tuv>
            <tuv lang="EN">
                <seg>No payment systems available</seg>
            </tuv>
            <tuv lang="ZH">
                <seg>读卡器:故障</seg>
            </tuv>
        </tu>       
        <tu tuid="$ALARM_BLACKLIST_CARD">
            <tuv lang="ES">
                <seg>Tarjeta de pago en lista negra</seg>
            </tuv>
            <tuv lang="EN">
                <seg>Payment card in blacklist</seg>
            </tuv>
            <tuv lang="ZH">
                <seg>付费的IC卡是黑名单卡</seg>
            </tuv>
        </tu>
   </body>
</tmx>

Upvotes: 2

Raymond Yee
Raymond Yee

Reputation: 559

Why not use XML entities to define terms that have to be localized and them have separate DTDs for each language? This is essentially the approach used by Firefox XUL, for instance: Localization on MDN.

Upvotes: 1

ya23
ya23

Reputation: 14526

I would create a template file e.g. named "somename.xml.template"

<devif>
  <screen id="scr1" title="[SettingsForThisCard]">
    <header text="[ClimateReadings]"/>
    <rd setp="123" text="[AirTemperature]" unit="°C"/>
...

Then you can create a bunch of ini-like files for each language containing:

SettingsForThisCard=your message in given language

Then, you can replace tags with the messages readed from the ini files. The advantage is that if there is a tag that has no translation, it is easy to detect and do not waste translation efforts. Also, it's very simple, thus may not be the best one for your specific requirements.

Upvotes: 1

Related Questions