xTheEc0
xTheEc0

Reputation: 47

UTF-8 literal into actual symbolised string

I instantly want to apologize as english is not my first language so sorry if 'symbolised string' or something similar doesn't actually make sense.

My situation: I am reading the google supported devices csv file (https://support.google.com/googleplay/answer/1727131?hl=en) with nodejs like so:
readFileSync(PATH, 'utf16le').split('\n');

One of the lines looks like Y6 \xe2\x85\xa1 Compact
The \xe2\x85\xa1 is a UTF-8 string literal which actually stands for (roman numeral 2) (atleast according to this: https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8448&number=128&utf8=string-literal&text=8545)

When I try to deal with this string node auto adds \\ making it \\xe2\\x85\\xa1 as \x is not a valid character etc.

But is there any way I could actually get the roman numeral?

If not, any suggestions on easily stripping such data out completely (the roman numeral is more of a 'cool to have' than 'must')

Upvotes: 0

Views: 433

Answers (1)

xTheEc0
xTheEc0

Reputation: 47

As most often happens, came up with a solution basically 5 minutes after posting...

\xe2\x85\xa1
\x replace with % =>
%e2%85%a1

decodeURIComponent()
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent

UTF-8 Literal is basically URI but with % instead of \x
So we just replace it with
string.replace(/\\x/g, '%'); //g to catch all instances in the string
and resolved it as URI back to a proper string.

Spend over an hour trying to figure this out, post the question and come up with solution in 5 minutes... What is even life...

Upvotes: 1

Related Questions