1owk3y
1owk3y

Reputation: 1202

Parsing unicode in unescaped XML

I'm trying to parse some poorly formatted XML.

I say poorly formatted - because everyone knows that you're not supposed to have un-escaped ampersands in an XML file.

Problem is, I need to collect some unicode formatted phrases from an XML file. I need the format to be as close to the original as possible. You can replicate this issue in your console log...

console.log($("<test>&#xE2;</test>").text())
// Outputs 'â' instead of desired '&#xE2;'

I've tried every combination of escape, unescape(), encodeURI(), decodeURI() I can fathom.

I've tried both settings for jQuery's ajax({processData: bool}) flag. All answers I've found point to these solutions - and it seems like none of them work...

How can I modify the above code to output the original XML content?

Upvotes: 0

Views: 319

Answers (1)

Will
Will

Reputation: 1281

Use new Option(yourUnescapedXml).innerHTML. So to answer your question directly,

console.log($(`<test>${new Option('&#xE2;').innerHTML}</test>`).text())

This creates an HTMLOptionElement, then immediately gets its (escaped) innerHtml.

Upvotes: 1

Related Questions