Chaitanya MSV
Chaitanya MSV

Reputation: 6784

Read UTF-8 hex codes in xml using JavaScript

I have a xml file which has an En Dash and Em Dash characters in it as part of element text. They are getting converted to UTF-8 codes as following.

<TextValue>This is an En Dash:  \xE2\x80\x93    This is an Em Dash: \xE2\x80\x94.</TextValue>

I would like to address those UTF-8 hex codes using JavaScript and replace them with any free text I want.

Could anyone suggest approaches to do it? I tried to use RegEx but was unable to parse those codes. I could address any other text using RegEx though.

Thank you.

Upvotes: 0

Views: 1247

Answers (2)

Chaitanya MSV
Chaitanya MSV

Reputation: 6784

I finally got away by reading body of the message in UTF-8 and use following lines to replace unicodes.

body = body.replace(/\u00E1/g,"a");  //LATIN SMALL LETTER A WITH ACUTE
body = body.replace(/\u00E2/g,"a");  //LATIN SMALL LETTER A WITH CIRCUMFLEX
body = body.replace(/\u00E3/g,"a");  //LATIN SMALL LETTER A WITH TILDE
body = body.replace(/\u201D/g,"\"");  //RIGHT DOUBLE QUOTATION MARK
body = body.replace(/\u201C/g,"\"");  //LEFT DOUBLE QUOTATION MARK
body = body.replace(/\u2424/g," ");  //NEW LINE \n
body = body.replace(/\u000D/g," ");  //CARRIAGE RETURN \r

Upvotes: 0

mplungjan
mplungjan

Reputation: 178350

DEMO

var text = "<TextValue>This is an En Dash:  \xE2\x80\x93    This is an Em Dash: \xE2\x80\x94.</TextValue>"

var fromArr = ["\xe2\x80\x98", "\xe2\x80\x99", "\xe2\x80\x9c", "\xe2\x80\x9d", "\xe2\x80\x93", "\xe2\x80\x94", "\xe2\x80\xa6"],
    toArr = ["'", "'", '"', '"', '-', '--', '...'];

    for (var i=0;i<fromArr.length;i++) {
        text = text.replace(fromArr[i],toArr[i],"g")
    }
        alert(text)

Change to

 var fromArr = ["\xe2\x80\x93", "\xe2\x80\x94"], toArr = [ '-', '--'];

if you do not need the smartquotes and ellipsis

Result:

enter image description here

Upvotes: 1

Related Questions