Abhi
Abhi

Reputation: 11

How to convert Latin characters to UTF-8 format in XSLT 2.0?

I have a XML which has some latin characters like é,ä,å,ß,ö etc and I have to transform this XML a CSV file using XSLT 2.0 and have to replace these latin characters to the UTF-8 format. I have used character-map to map these 5 characters to e,ae,aa,ss,oe respectively but the input XML can have many other characters like these. Is there a way that I can convert these characters using some encoding? Any leads would be really helpful.

Regards, Abhi

Upvotes: 1

Views: 1398

Answers (2)

Michael Kay
Michael Kay

Reputation: 163312

You can strip accents by first converting strings to decomposed normal form (in which the accents are represented by separate codepoints), and then stripping the accents using the replace() function:

replace(normalize-unicode($in, 'NFD'), 
        '\p{IsCombiningDiacriticalMarks}', '')

That doesn't solve cases like ß and æ, but it will get you a long way.

(Also: this strips accents from accented letters. But it has nothing to do with your question title, which is about conversion to UTF-8. I suspect you are confused about the actual requirements.)

Upvotes: 3

Amrendra Kumar
Amrendra Kumar

Reputation: 1816

You can use translate function by giving the mapping for corresponding:

<xsl:value-of select="translate(., '&#x00E4;&#x00DF;','&#x0061;&#x0042;')"/>

Upvotes: 0

Related Questions