HTML5 FileReader API Memory Issues

Question

To be brief, I have written a file uploader using the HTML5 FileReader API and xhr posts to upload user-selected files to a server. My client-side code has a few essential tasks, including getting values from the selected files' headers (these are DICOM image files) and displaying them prior to sending the files to the server, updating a progress bar, etc. I have some more features, including zipping the files if it will speed things up, etc. going on.

Fairly quickly, I noticed that large files ate up a ton of memory (Chrome-specific). Given a large enough data set, Chrome "Aw, Snap!"s and crashes entirely. I've implemented countless fixes: exhaustive searches for memory leaks, delayed reading and sending of files using callbacks and a small queue, only reading size n chunks of each file at a time, etc. As you can imagine, this has lead to some pretty hefty client-side JavaScript (coffeescript, actually). In the following fiddle, a coworker and I have pared it down to the bare essentials: reading all selected files in chunks and setting a variable to that binary data (sparing everyone reading through the code that parses the headers, zips when necessary, and sends each chunk).

https://jsfiddle.net/3nails4you/gsqzrk9g/8/, or see below:

HTML:

JavaScript:

function slice() {

    var filesArr = document.getElementById('file').files;
    var index;
    for (index = 0; index < filesArr.length; index++) {
        readFile(filesArr[index]);
    }
}

function readFile(file) {

    var fr = new FileReader(),
        chunkSize = 2097152,
        chunks = Math.ceil(file.size / chunkSize),
        chunk = 0;

    function loadNext() {
        var start, end, blob;

        start = chunk * chunkSize;
        end = start + chunkSize >= file.size ? file.size : start + chunkSize;

        fr.onload = function (e) {
            // get file content
            var filestream = e.target.result;
            if (++chunk < chunks) {
                console.info(chunk);
                loadNext();
            }
        };
        blob = file.slice(start, end);
        fr.readAsBinaryString(blob);
    }
    loadNext();
}

I have tried different methods of reading (as ArrayBuffer, DataURL), many different structures as far as variable scopes (e.g. declaring only 1 FileReader and reusing, etc.), and have tried many different chunk sizes for optimization. When I select a specific data set that is ~1 GB, across 16 files, memory usage looks like this:

[EDIT] I'm not able to post images, yet, so I'll just describe. Looking at the Windows task manager, the chrome process is using 625,000 K memory.

Notably, if I wait for the reading to finish (console log will stop outputting), the memory usage becomes static. If, at that point, I open the JavaScript console, the memory usage drops to what it was before the file reading began. My suspicion is that the act of opening the console fires off Chrome's garbage collection, or something along those lines, but I'm uncertain.

I've found other questions about somewhat similar issues, but all of them are answered with the assumption that the client does not actually need to use the binary data of the file. I absolutely do - any suggestions? Is this simply a bug to report on the Chromium projects? Is there a glaring error in my code that I've simply missed? I usually tend to suspect the latter, but the "opening the console clears the memory" point continues to irk me - if there was a memory leak, would that really be the case? Thanks for reading, I appreciate any suggestions!

3nails4you · Accepted Answer

In case anyone stumbles onto this question with the same problem, I thought I'd share what we found in order to alleviate this.

I ended up purchasing a license and incorporating plupload into my coffeescript. This helps to solve the memory issue in this way:

First, I create a new plupload object, and set its event handlers (BeforeUpload, UploadProgress, etc.). Its 'Destroy' handler calls a javascript function, nextUploader(), which creates another uploader object and queues up the next portion of files. After the destroy occurs, the plupload object's memory usage is successfully reclaimed, and so the browser's memory usage stays within a reasonable range.

If anyone is looking to do HTML5 file reading and uploading, I highly recommend exploring plupload - it's quite easy to use, and we found that it is in use by Dropbox as well.

HTML5 FileReader API Memory Issues

Answers (1)

Related Questions