Reputation: 47
I instantly want to apologize as english is not my first language so sorry if 'symbolised string' or something similar doesn't actually make sense.
My situation: I am reading the google supported devices csv file (https://support.google.com/googleplay/answer/1727131?hl=en) with nodejs like so:
readFileSync(PATH, 'utf16le').split('\n');
One of the lines looks like Y6 \xe2\x85\xa1 Compact
The \xe2\x85\xa1
is a UTF-8 string literal which actually stands for Ⅱ
(roman numeral 2) (atleast according to this: https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8448&number=128&utf8=string-literal&text=8545)
When I try to deal with this string node auto adds \\ making it \\xe2\\x85\\xa1
as \x is not a valid character etc.
But is there any way I could actually get the roman numeral?
If not, any suggestions on easily stripping such data out completely (the roman numeral is more of a 'cool to have' than 'must')
Upvotes: 0
Views: 433
Reputation: 47
As most often happens, came up with a solution basically 5 minutes after posting...
\xe2\x85\xa1
\x
replace with %
=>
%e2%85%a1
decodeURIComponent()
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent
UTF-8 Literal is basically URI but with %
instead of \x
So we just replace it with
string.replace(/\\x/g, '%');
//g to catch all instances in the string
and resolved it as URI back to a proper string.
Spend over an hour trying to figure this out, post the question and come up with solution in 5 minutes... What is even life...
Upvotes: 1