user3221449
user3221449

Reputation: 103

How to convert string of mixed latin and unicode characters

I have a number of strings consisting of mixed latin and unicode encoded cyrillic symbols. What I need is a javascript function to convert these strings into a human readable form. Here is what I came up with :

var EGstr = 'Гриф Kettler прямой';
var newStr = EGstr.replace(/&#(\d+);/g, String.fromCharCode('$1') );

Supposed to be working fine but it's not... Please tell me how to change the code properly.

Upvotes: 0

Views: 904

Answers (2)

Minko Gechev
Minko Gechev

Reputation: 25672

You can use:

var d = document.createElement('div');
d.innerHTML = 'Гриф Kettler прямой';
alert(d.innerHTML); //Гриф Kettler прямой

instead of regex.

Or if we put it into a function...

function getText(txt) {
  var d = document.createElement('div');
  d.innerHTML = txt;
  return d.innerHTML;
}

Upvotes: 1

nhahtdh
nhahtdh

Reputation: 56819

You can supply a replacement function to replace method:

var newStr = EGstr.replace(/&#(\d+);/g, function(_, $1) {
    return String.fromCharCode($1);
});

The 1st argument to the replacement function will be the text that matches the whole expression (which we don't need).

The 2nd argument onwards will be whatever captured by capturing groups.

The next to last argument and the last argument will contain the offset of the match and the source string respectively (which we also don't need here, so I don't declare them in the replacement function).

Upvotes: 1

Related Questions