Reputation: 2658
I have an xml file encoded in UTF16, and I would like to convert it to UTF8 in order to process it. If I use this command:
iconv -f UTF-16 -t UTF-8 file.xml > converted_file.xml
The file is converted correctly and I'm able to process it. I want to do the same in nodejs.
Currently I have a buffer of my file and I've tried everything I could think of and what I could find on the internet but unsuccessfully.
Here is some examples of what I've tried so far:
content = new Buffer((new Buffer(content, 'ucs2')).toString('utf8'));
I've also tried using those functions:
http://jonisalonen.com/2012/from-utf-16-to-utf-8-in-javascript/ https://stackoverflow.com/a/14601808/1405208
The first one doen't change anything and the links only give me chinese characters.
Upvotes: 3
Views: 6856
Reputation: 86
While the answer above me is the best answer for the question asked. I'm hoping that this answer will help some folks that need to read a file as a binary string:
const reader = new FileReader();
reader.readAsBinaryString(this.fileToImport);
In my case the file was in utf-16 and I tried to read it into XLSX:
const wb = XLSX.read(bstr, { type: "binary" });
Combining both links from above, I first removed the first two chars that signaled it was UTF-16 (0xFFFE) then used this link to create the right number (but I think that it actually provides UTF-7 encoding) https://stackoverflow.com/a/14601808/1405208
Lastly, I applied the second link to get the right set of UTF-8 number: https://stackoverflow.com/a/14601808/1405208
The Code that I ended up with:
decodeUTF16LE(binaryStr) {
if (binaryStr.charCodeAt(0) != 255 || binaryStr.charCodeAt(1) != 254) {
return binaryStr;
}
const utf8 = [];
for (var i = 2; i < binaryStr.length; i += 2) {
let charcode = binaryStr.charCodeAt(i) | (binaryStr.charCodeAt(i + 1) << 8);
if (charcode < 0x80) utf8.push(charcode);
else if (charcode < 0x800) {
utf8.push(0xc0 | (charcode >> 6), 0x80 | (charcode & 0x3f));
} else if (charcode < 0xd800 || charcode >= 0xe000) {
utf8.push(0xe0 | (charcode >> 12), 0x80 | ((charcode >> 6) & 0x3f), 0x80 | (charcode & 0x3f));
}
// surrogate pair
else {
i++;
// UTF-16 encodes 0x10000-0x10FFFF by
// subtracting 0x10000 and splitting the
// 20 bits of 0x0-0xFFFFF into two halves
charcode = 0x10000 + (((charcode & 0x3ff) << 10) | (charcode & 0x3ff));
utf8.push(
0xf0 | (charcode >> 18),
0x80 | ((charcode >> 12) & 0x3f),
0x80 | ((charcode >> 6) & 0x3f),
0x80 | (charcode & 0x3f)
);
}
}
return String.fromCharCode.apply(String, utf8);
},
Upvotes: 3
Reputation: 2062
var content = fs.readFileSync('myfile.xml', {encoding:'ucs2'});
fs.writeFileSync('myfile.xml', content, {encoding:'utf8'});
Upvotes: 5