Reputation: 4094
I have to read a file encoded in UTF-16 using nodejs (in chunks because it is very large). The data from the file will go into a mongodb, so I will need to convert it into utf-8. From googling, it seems that this is just plain not supported by Node, and I will have to resort to converting the raw data from a buffer myself. But I also think there ought to be a better way and I'm just not finding it. Any suggestions?
Thanks.
Upvotes: 28
Views: 24129
Reputation: 123058
Replace the normal utf8
you'd have when reading a text file with utf16le
or ucs2
:
var fileContents = fs.readFileSync('import.csv','utf16le')
or:
var fileContents = fs.readFileSync('import.csv','ucs2')
Also, for anyone searching the internet: anyone getting additional � (question mark) characters appearing in a parsed file, this is probably the cause of your problem. Read the file as UTF16/UCS2 and the extra characters will disappear.
Upvotes: 45
Reputation: 4623
Node supports UCS-2, the UTF-16 subset supported by JavaScript. Try using that.
See this pull request.
Upvotes: 25