Siw Meckelborg
Siw Meckelborg

Reputation: 45

Localization with xslt, do any standards exist that deal with different languages in XML?

I am working with a project where I need to create a html-file supporting several languages from an xslt-transformation. I have read several articles around this, and also looked at previous questions here in stackoverflow, like this one: xslt localization

And the solution to put the translations in a separate xml document works just as I want it to. But I wonder if there exists any standardized/best practice for the "translate.xml" file? In the post referenced above, the following is given as an example:

<strings>
    <string language="en" key="ContactDetails">Contact Details</string>
    <string language="sv" key="ContactDetails">Kontaktuppgifter</string>
    [...]
</strings>   

As I said, the solution suggested with using keys retrieve the strings from the transalte.xml works as I want it to, but I like to use standards if they are available, so my question is if there is a standardized schema for these types of xml files, or some kind of best practice on the naming of tags etc in such a "translate.xml"?

Upvotes: 2

Views: 648

Answers (1)

Abel
Abel

Reputation: 57169

Good question!

Yes, there is a standardized way of dealing with languages. Use the xml:lang attribute. The namespace magically exists in any XML document and is part of the core XML specification. It is defined to take a language specifier according to RFC-4646. These are the often-seen specifiers like en, en-US, es, es-BR, specifying the main language and the language variant.

The way it works is a bit like namespaces. If you define it on an element, then it is inherited by all descendants of that element, unless you redefine it, or undefine it to denote language-indepent elements.

For instance:

<text xml:lang="en">
    We are 
    <t xml:lang="en-US">organizing</t>
    <t xml:lang="en-GB">organising</t>
    a conference on the effects of 
    <t xml:lang="en-US">color</t>
    <t xml:lang="en-GB">colour</t>
    in December this year.
</text>

Using a query language, i.e. XSLT, with a copy-idiom, this works excellently together with the lang() function, which takes the applicable language from the nearest ancestor or self and returns boolean true if found. It will also find the language variant like en-US if you set the main language, like en.

The following assumes XSLT 1.0, but works with 2.0 and 3.0 as well (this code was kindly corrected by Michael, see comments):

<!-- match English US language and default en -->
<xsl:template match="t[lang('en-US') or lang('en')">
    <xsl:apply-templates />
</xsl:template>

<!-- remove any other <t> -->
<xsl:template match="t" />

Note: always set a default language on the outermost element, as lang() will return false if no language is found at all. You could test for this with the expression lang(''), which will return false only if no language was set at all.

About your XML files, if you have one file per language, and don't mix and match like suggested above, you can still use the same approach by setting xml:lang on the root element. Since this will then be inherited throughout the whole tree, you can still use the lang() function.

Upvotes: 1

Related Questions