Reputation: 11
I have a XML which has some latin characters like é,ä,å,ß,ö etc and I have to transform this XML a CSV file using XSLT 2.0 and have to replace these latin characters to the UTF-8 format. I have used character-map to map these 5 characters to e,ae,aa,ss,oe respectively but the input XML can have many other characters like these. Is there a way that I can convert these characters using some encoding? Any leads would be really helpful.
Regards, Abhi
Upvotes: 1
Views: 1398
Reputation: 163312
You can strip accents by first converting strings to decomposed normal form (in which the accents are represented by separate codepoints), and then stripping the accents using the replace() function:
replace(normalize-unicode($in, 'NFD'),
'\p{IsCombiningDiacriticalMarks}', '')
That doesn't solve cases like ß and æ, but it will get you a long way.
(Also: this strips accents from accented letters. But it has nothing to do with your question title, which is about conversion to UTF-8. I suspect you are confused about the actual requirements.)
Upvotes: 3
Reputation: 1816
You can use translate function by giving the mapping for corresponding:
<xsl:value-of select="translate(., 'äß','aB')"/>
Upvotes: 0