Reputation: 2477
So I'm converting a string to BASE64 as shown in the code below...
var str = "Hello World";
var enc = window.btoa(str);
This yields SGVsbG8gV29ybGQ=
. However if I add these characters – ”
such as the code shown below, the conversion doesn't happen. What is the reason behind this? Thank you so much.
var str = "Hello – World”";
var enc = window.btoa(str);
Upvotes: 7
Views: 9751
Reputation: 27222
This ultimately owes to a deficiency in the JavaScript type system.
JavaScript strings are strings of 16-bit code units, which are customarily interpreted as UTF-16. The Base64 encoding is a method of transforming an 8-bit byte stream into a string of digits, by taking each three bytes and mapping them into four digits, each covering 6 bits: 3 × 8 = 4 × 6. As we see, this is crucially dependent on the bit width of each symbol.
At the time the btoa
function was defined, JavaScript had no type for 8-bit byte streams, so the API was defined to take the ordinary 16-bit string type as input, with the restriction that each code unit was supposed to be confined to the range [U+0000, U+00FF]; when encoded into ISO-8859-1, such a string would reproduce the intended byte stream exactly.
(Newer code should probably use Uint8Array.fromBase64
and Uint8Array.prototype.toBase64
instead, when those become available.)
The character –
is U+2013, while ”
is U+201D; neither of those characters fits into the above-mentioned range, so the function rejects it.
If you want to convert Unicode text into Base64, you need to pick a character encoding and convert it into a byte string first, and encode that. Asking for a Base64 encoding of a Unicode string itself is meaningless.
Upvotes: 0
Reputation: 59
i was struggled with this one too. so i made up two functions.
function TtB64(txt){
return btoa(new TextEncoder().encode(txt).join(' '))
};
function TfB64(txt){
return new TextDecoder().decode(new Uint8Array(atob(txt).split(' ').map(x => x=parseInt(x))))
};
the first one any text to base 64, and the second one from base 64 to text.
Upvotes: 0
Reputation: 59
why so complicated?
btoa(new TextEncoder().encode("Hello – World”").join(''))
will do..
Upvotes: -1
Reputation: 53597
btoa
is an exotic function in that it requires a "Binary String", i.e. it's a String
datatype but every "letter" doesn't represent a letter but a byte. As such, you can't have any "letters" with Unicode codepoints above 0xFF (charcode 255), such as used by your em dash and "fancy" quote symbol.
You'll either have to uri encode the data first, making it safe:
> var str = `Hello – World`;
> window.btoa(encodeURIComponent(str));
"SGVsbG8lMjAlRTIlODAlOTMlMjBXb3JsZA=="
And then remember to decode it again when unpacking yourself:
> var base64= "SGVsbG8lMjAlRTIlODAlOTMlMjBXb3JsZA==";
> decodeURIComponent(window.atob(base64));
"Hello – World"
Or rely on targets that automatically apply URI decoding like href
attributes (a
, link
, etc).
However, if your target doesn't (your own code, or src
attributes on img
, script
, etc.) then you'll need to turn your string into a new string that conforms to single byte packing. This is explicitly called out over on the MDN for base64, with their solution being:
function base64(data) {
const bytes = new TextEncoder().encode(data);
const binString = String.fromCodePoint(...bytes);
return btoa(binString);
}
with the equivalent decoder:
function decode64(base64) {
const binString = atob(base64);
const bytes = Uint8Array.from(binString, (m) => m.codePointAt(0));
return new TextDecoder().decode(bytes);
}
You'll need the decode64
if you want to unpack things in your own code, but the base64
function will yield a converted string that will work when put into a data-url (e.g. data:text/javascript;base64,${base64text}
);
Upvotes: 13
Reputation: 136765
The most bullet proof way is to work on binary data directly.
For this, you can encode your string to an ArrayBuffer
object representing the UTF-8 version of your string.
Then a FileReader
instance will be able to give you the base64 quite easily.
var str = "Hello – World”";
var buf = new TextEncoder().encode( str );
var reader = new FileReader();
reader.onload = evt => { console.log( reader.result.split(',')[1] ); };
reader.readAsDataURL( new Blob([buf]) );
And since the Blob()
constructor does automagically encode DOMString
instances to UTF-8, we could even get rid of the TextEncoder
object:
var str = "Hello – World”";
var reader = new FileReader();
reader.onload = evt => { console.log( reader.result.split(',')[1] ); };
reader.readAsDataURL( new Blob([str]) );
Upvotes: 0
Reputation: 3968
The Problem is the character ”
lies outside of Latin1 range.
For this you can use unescape
(now deprecated)
var str = "Hello – World”";
var enc = btoa(unescape(encodeURIComponent(str)));
alert(enc);
And to decode:
var encStr = "SGVsbG8g4oCTIFdvcmxk4oCd";
var dec = decodeURIComponent(escape(window.atob(encStr)))
alert(dec);
Upvotes: 1