Erik Jacobs
Erik Jacobs

Reputation: 901

Creating a Blob or a File from JavaScript binary string changes the number of bytes?

I have been playing with a few JS encryption libraries (CryptoJS, SJCL) and discovered problems related to the Blob/File APIs and JavaScript "binary strings".

I realized that the encryption isn't even really relevant, so here's a much simplified scenario. Simply read a file in using readAsBinaryString and then create a Blob:

>>> reader.result
"GIF89a����ÿÿÿÿÿÿ!þCreated with GIMP�,�������D�;"
>>> reader.result.length
56
>>> typeof reader.result
"string"
>>> blob = new Blob([reader.result], {type: "image/gif"})
Blob { size=64, type="image/gif", constructor=function(), more...}

I have created a JSFiddle that will basically do the above: it simply reads any arbitrary file, creates a blob from it, and outputs the length vs size: http://jsfiddle.net/6L82t/1/

It appears that, when creating the Blob from the "binary (javascript) string", something with character encoding ends up munging the result.

If a non-binary file is used, you will see that the lengths of the Blob and the original binary string are identical.

So there is something that happens when trying to create a Blob/File from a non-plaintext Javascript string, and I need whatever that is to not happen. I think it may have something to do with the fact that JS strings are UTF-16?

There's a (maybe) related thread here: HTML5 File API read as text and binary

Do I need to possibly take the decrypted results (UTF-16) and "convert" them to UTF-8 before putting them in a Blob/File?

Working with someone in #html5 on Freenode, we determined that if you read an ArrayBuffer directly and then create the blob from that by first using a Uint8Array, the bytes work out just fine. You can see a fiddle that essentially does that here: http://jsfiddle.net/GH7pS/4/

The issue is, at least in my scenario, I am going to end up with a binary string and would like to figure out how to directly convert that into a Blob so that I can then use html5's download to allow the user to click to download the blob directly.

Thanks!

Upvotes: 20

Views: 33891

Answers (1)

Bergi
Bergi

Reputation: 664599

It appears that, when creating the Blob from the "binary (javascript) string", something with character encoding ends up munging the result.

Yes. That post you read explains well how a "binary string" is constituted.

The Blob constructor in contrast does

  1. Let s be the result of converting [the string] to a sequence of Unicode characters using the algorithm for doing so in WebIDL.
  2. Encode s as UTF-8 and append the resulting bytes to [the blob].

We determined that if you read an ArrayBuffer directly and then create the blob from that by first using a Uint8Array, the bytes work out just fine.

Yes, that's how it is supposed to work. Just do the encryption on a Typed Array where you deal with the bytes individually, not on some string.

The issue is, at least in my scenario, I am going to end up with a binary string

Again: Try not to. binary strings are deprecated.

I would like to figure out how to directly convert a binary string into a Blob. Do I need to possibly take the decrypted results (UTF-16) and "convert" them to UTF-8 before putting them in a Blob/File?

No, better don't try to do any string conversions. Instead, construct a Uint8Array(Uint8Array) for the bytes that you want to get from the binary string.

This should do it (untested):

var bytes = new Uint8Array(str.length);
for (var i=0; i<str.length; i++)
    bytes[i] = str.charCodeAt(i);

Upvotes: 26

Related Questions