Reputation: 901
I have been playing with a few JS encryption libraries (CryptoJS, SJCL) and discovered problems related to the Blob/File APIs and JavaScript "binary strings".
I realized that the encryption isn't even really relevant, so here's a much simplified scenario. Simply read a file in using readAsBinaryString and then create a Blob:
>>> reader.result
"GIF89a����ÿÿÿÿÿÿ!þCreated with GIMP�,�������D�;"
>>> reader.result.length
56
>>> typeof reader.result
"string"
>>> blob = new Blob([reader.result], {type: "image/gif"})
Blob { size=64, type="image/gif", constructor=function(), more...}
I have created a JSFiddle that will basically do the above: it simply reads any arbitrary file, creates a blob from it, and outputs the length vs size: http://jsfiddle.net/6L82t/1/
It appears that, when creating the Blob from the "binary (javascript) string", something with character encoding ends up munging the result.
If a non-binary file is used, you will see that the lengths of the Blob and the original binary string are identical.
So there is something that happens when trying to create a Blob/File from a non-plaintext Javascript string, and I need whatever that is to not happen. I think it may have something to do with the fact that JS strings are UTF-16?
There's a (maybe) related thread here: HTML5 File API read as text and binary
Do I need to possibly take the decrypted results (UTF-16) and "convert" them to UTF-8 before putting them in a Blob/File?
Working with someone in #html5 on Freenode, we determined that if you read an ArrayBuffer directly and then create the blob from that by first using a Uint8Array, the bytes work out just fine. You can see a fiddle that essentially does that here: http://jsfiddle.net/GH7pS/4/
The issue is, at least in my scenario, I am going to end up with a binary string and would like to figure out how to directly convert that into a Blob so that I can then use html5's download to allow the user to click to download the blob directly.
Thanks!
Upvotes: 20
Views: 33891
Reputation: 664599
It appears that, when creating the Blob from the "binary (javascript) string", something with character encoding ends up munging the result.
Yes. That post you read explains well how a "binary string" is constituted.
The Blob
constructor in contrast does
- Let
s
be the result of converting [the string] to a sequence of Unicode characters using the algorithm for doing so in WebIDL.- Encode
s
as UTF-8 and append the resulting bytes to [the blob].
We determined that if you read an ArrayBuffer directly and then create the blob from that by first using a Uint8Array, the bytes work out just fine.
Yes, that's how it is supposed to work. Just do the encryption on a Typed Array where you deal with the bytes individually, not on some string.
The issue is, at least in my scenario, I am going to end up with a binary string
Again: Try not to. binary strings are deprecated.
I would like to figure out how to directly convert a binary string into a Blob. Do I need to possibly take the decrypted results (UTF-16) and "convert" them to UTF-8 before putting them in a Blob/File?
No, better don't try to do any string conversions. Instead, construct a Uint8Array
(Uint8Array) for the bytes that you want to get from the binary string.
This should do it (untested):
var bytes = new Uint8Array(str.length);
for (var i=0; i<str.length; i++)
bytes[i] = str.charCodeAt(i);
Upvotes: 26