Jesse Vlietveld
Jesse Vlietveld

Reputation: 393

UTF-16 Hex Decode NodeJS

I am trying to decode a UTF-16 Hex (Hello 世界) to a String in NodeJS. I've tried doing so by making a buffer from the hex:

let vari = new Buffer.from('00 48 00 65 00 6C 00 6C 00 6F 00 20 4E 16 75 4C', 'hex').toString();

But when I console log the 'vari' I do not get any/the correct result. I've tried passing in 'utf8' and 'utf16le' to the toString method but it doesnt seem to work either. Can anyone point me in the right direction?

Upvotes: 3

Views: 1208

Answers (2)

Mr. Doge
Mr. Doge

Reputation: 886

I came from node.js fastest way to decode hex string as utf-16

hex 48 00 65 00 6c 00 6c 00 6f 00 20 00 16 4e 4c 75 decoded as utf16le is "Hello 世界"

you have 00 48 00 65 00 6C 00 6C 00 6F 00 20 4E 16 75 4C
let's compare the 2:
48 00 65 00 6c 00 6c 00 6f 00 20 00 16 4e 4c 75 decoded as utf16le is 'Hello 世界'
00 48 00 65 00 6C 00 6C 00 6F 00 20 4E 16 75 4C decoded as utf16be is 'Hello 世界'
the only difference is that every 2 bytes are swapped

and it's not any every 2 bytes, they have opposite byte order

either you need to reverse the byte order manually OR have a function .toString('utf16be')
be:Big Endian is the opposite of le:Little Endian

but node.js doesn't provide buf.toString('utf16be')

to "reverse the byte order manually" there's: buf.swap16() https://nodejs.org/api/buffer.html#buffer_buf_swap16

One convenient use of buf.swap16() is to perform a fast in-place conversion between UTF-16 little-endian and UTF-16 big-endian:

const myHexStr = '00 48 00 65 00 6C 00 6C 00 6F 00 20 4E 16 75 4C'
//your bytes shouldn't be whiteSpace separated for `Buffer.from`, it won't work
const removedSpaces = myHexStr.replace(/ /g,'') // '00480065006C006C006F00204E16754C'
const buf = Buffer.from(removedSpaces,'hex') // <Buffer 00 48 00 65 00 6c 00 6c 00 6f 00 20 4e 16 75 4c>
const reversedBuf = buf.swap16() // <Buffer 48 00 65 00 6c 00 6c 00 6f 00 20 00 16 4e 4c 75>
const backToJavascriptString = reversedBuf.toString("utf16le") // 'Hello 世界'

this can be simplified to one line:

Buffer.from('00 48 00 65 00 6C 00 6C 00 6F 00 20 4E 16 75 4C'.replace(/ /g,''),'hex').swap16().toString("utf16le")

Here is a better way to do it (performance-wise):
thankfully node.js provides at least a function to read as big endian: buf.readUInt16BE()
: but it only reads 2 bytes at a time. guess what else is 2 bytes ? UTF-16

my implementation:
push all these readUInt16BE to array of fixed length

function ubeFunc(hexx4) {
  const buf = Buffer.from(hexx4, "hex"), len = buf.length/2, arr = new Array(len)
  for (let i = 0; i < len; i++) {
    arr[i]=buf.readUInt16BE(i*2)
  }
  return String.fromCharCode(...arr)
}

how you'd use it

ubeFunc('00 48 00 65 00 6C 00 6C 00 6F 00 20 4E 16 75 4C'.replace(/ /g,'')) // 'Hello 世界'

this is overall fastest: read here: node.js fastest way to decode hex string as utf-16

Upvotes: 1

domsim1
domsim1

Reputation: 640

It's not working because you are creating a new buffer out of the representation of the buffer as a string. This will result in a buffer then when decoded, will be the string of the buffer '00 48 00 65 00 6C 00 6C 00 6F 00 20 4E 16 75 4C', but because of hex the buffer is going to be empty. if you are to console.log(Buffer.from('00 48 00 65 00 6C 00 6C 00 6F 00 20 4E 16 75 4C', 'hex') you will see an empty buffer.

Also, '00 48 00 65 00 6C 00 6C 00 6F 00 20 4E 16 75 4C' is not a UTF-16 hex representation of "Hello 世界". when encoded as a string, it is this: 䠀攀氀氀漀 ᙎ䱵. 48 00 65 00 6c 00 6c 00 6f 00 20 00 16 4e 4c 75 is "Hello 世界" in UTF-16 hex, I got this from running console.log(Buffer.from('Hello 世界', 'utf16le'));.

To answer the question on how you can convert '48 00 65 00 6c 00 6c 00 6f 00 20 00 16 4e 4c 75' back to "Hello 世界" you can do the following:

let hexStrings = '48 00 65 00 6c 00 6c 00 6f 00 20 00 16 4e 4c 75'.split(' '); // split string chunks
let hex = hexStrings.map(x => parseInt(x, 16)); // convert string chunks to hexadecimal
let buffer = Buffer.from(hex); // create buffer from hexadecimal array
let string = buffer.toString('utf16le'); // convert buffer to string
console.log(string); // output -> Hello 世界

Hope this helps!

Upvotes: 2

Related Questions