peter
peter

Reputation: 357

Value &# to unicode convert

I have lots of characters in the form ¶ which I would like to display as unicode characters in my text editor. This ought to convert them:

var newtext = doctext.replace(
    /&#(\d+);/g, 
    String.fromCharCode(parseInt("$1", 10))
);

But doesn't seem to work. The regular expression /&#(\d+);/ is getting me the numbers out -- but the String.fromCharCode does not appear to give the results I'd like. What is up?

Upvotes: 2

Views: 1236

Answers (2)

Amadan
Amadan

Reputation: 198436

The replace method is not foolproof, if you use full HTML (i.e. don't control what the input is). For example, the method submitted by Jack (and obviously the idea in the original post as well) works excellently if your entities are all decimal, but doesn't work for hex A, and even less for named entities like ".

For this, there is another trick you can do: create an element, set its innerHTML to the source, then read out its text value. Basically, browsers know what to do with entities, so we delegate. :) In jQuery it is easy:

$('<div/>').html('&amp;').text()
// => "&"

With plain JS it gets a bit more verbose:

var el = document.createElement();
el.innerHTML = '&amp;';
el.textContent
// => "&"

Upvotes: 2

Ja͢ck
Ja͢ck

Reputation: 173642

The replacement part should be an anonymous function instead of an expression:

var newtext = doctext.replace(
    /&amp;#(\d+);/g, 
    function($0, $1) {
        return String.fromCharCode(parseInt($1, 10));
    }
);

Upvotes: 6

Related Questions