Andrew Bullock
Andrew Bullock

Reputation: 37436

encode string as utf-16 to base64 in javascript

I'm struggling to find any resources on this online, which is concerning. I've been reading about UCS-2 and UTF-16 woes, but I can't find a solution.

I need to get a value from an input:

var val = $('input').val()

and encode it to base64, treating the text as utf-16, so:

this is a test

becomes:

dABoAGkAcwAgAGkAcwAgAGEAIAB0AGUAcwB0AA==

and not the below, which you get treating it as UTF-8:

dGhpcyBpcyBhIHRlc3Q=

Upvotes: 3

Views: 4340

Answers (1)

Your data, once read into JavaScript, will be in an encodingless numerical format (strictly speaking, it has to be in Unicode Normalised Form C, but Unicode is just a series of identifying numbers for each glyph in the Unicode lexicon. It's encoding-less). So: if you specifically need the data encoded as a UTF-16 byte sequence, do so, then base64 encode that.

But here's the fun part: which UTF-16 do you need? Little or Big Endian? With or without BOM? UTF-16 is a really inconvenient encoding format (we're not even going to touch UCS-2. It's obsolete. Has been for a long time).

What you really should need is to get a text value from your HTML element, Base64 encode its value, and then have whatever receives that data unpack it as UTF8; don't try to make JavaScript do more work than it has to. I presume you're sending this data to a server or something, in which case: your server language is way more elaborate than JavaScript, and can unpack text in about a million different encodings thanks to built-in functions. So just use that. Don't solve Y for X.

Upvotes: 1

Related Questions