George Hernando
George Hernando

Reputation: 2630

Javascript Convert Unicode Embedded strings

I'm getting passed a string from an API method I have no control over that returns strings with paths that sometimes that look like: /toplevel/nextlevel/_x0034_33name/myoutput.log

It seems to happens with directory names that start with a number. In this case the directory name should be '433name'.

I'm guessing that x0034 represents hex for the character '4', possibly in unicode.

The following Javascript returns '4', which would be correct:

String.fromCharCode(parseInt("0034",16))

Is there a regex command or conversion utility in Javascript readily available to remove and replace all these characters in the string with their correct equivalents?

Upvotes: 0

Views: 129

Answers (3)

Jon Cooke
Jon Cooke

Reputation: 92

'/toplevel/nextlevel/_x0034_33name/myoutput.log'.replace(/_x[\da-f]{4}_/gi,function(match)  {
return String.fromCharCode(parseInt(match.substr(2,4),16)) 

});

This will be fine as long as the encodings match

Upvotes: 0

Jongware
Jongware

Reputation: 22447

Your diagnostics are okay but a bit off. The 'encoded' part is not just the 'x', it's the entire string _xhhhh_.

Try this:

x = '/toplevel/nextlevel/_x0034_33name/myoutput.log';
y = x.replace (/_x([0-9A-F]{4})_/gi, function (a,b) { return String.fromCharCode(parseInt(b,16)); });

-- then y will hold your parsed path.

(Oh, as F.J. says, this might need the i Ignore Case flag as well. Hard to say with such a limited test set of data.)

Upvotes: 0

Andrew Clark
Andrew Clark

Reputation: 208405

function unescapeApi(string) {
    return string.replace(/_x([\da-f]{4})_/gi, function(match, p1) {
        return String.fromCharCode(parseInt(p1, 16));
    });
}

# example, logs '/433name/myoutput.log'
console.log(unescapeApi('/_x0034_33name/myoutput.log'));

Upvotes: 2

Related Questions