Large memory usage while loading big json to firefox

Question

Wea re trying to build a client based application with html5 and indexeddb on Firefox 28. Large data are loading to firefox for the first time use with the format of json by AJAX requests. Each json resposnse is about 2MB (gzipped), data will be store to indexeddb with IDBWrapper.

The memory usage in firefox increased rapidly with responses. while we run about 12 responses the memory of firefox will grow to about 1GB (3GB total) and sometimes it will cause AJAX error. After success loading memory will drop direct to 500MB or below.

It seems that in the whole loading/storing process firefox don't have the chance to do GC, we've tried to use setTimeout between each response but it seems helpless.

My question is, is there any other way to reduce the memory usage of firefox while loading big json?

EDIT: The code is something like this:

$.ajax({
    url : URL['getData'],
    dataType : "json",
    data : {
        startDate : _startDate,
        endDate : _endDate,
        format : "json"
    },
    async : true,
    success : function(_data) {
        _data = null;
                // store to indexeddb
        _callback();
    },
    error : function() {
        setCurrentDataLabel('Error', 0);
    }
});

I remove the part of sotre to indexeddb to make is simple. It is funny that the memory also increased quickly. However if I change dataType : "json" to dataType : "text", memory useage is much small and GC is running obviously. It seems firefox have some performance issue while handling json.

David-SkyMesh · Accepted Answer

Server-side Pagination

If the "server" in question is running some programming language/framework, and if you can easily make changes to the code running on that server, then the simplest approach would be what @Coder has alluded to in the comments. i.e: server-side pagination.

I'll leave @Coder to provide an answer for the "server pagination" case.

Client-side Streaming

On the other hand, if there's no code you can change on the server (e.g: it's a static JSON file that's produced by some other process, and there's only a simple static HTTP server in place) then you might want to take the approach that I mentioned in the comments ie: having the client do the streaming.

In this approach:

You make a request to the server for the whole JSON file (whether that be 2MB or larger) -- instead of using XMLHttpRequest (AJAX), you use the native platform APIs in Firefox (StreamListener).
As data becomes available to read from the server, you use a so called JSON "Pull Parser" to parse the JSON text incrementally as it arrives.
You interpret the flow of parse "events" emitted by the pull parser in order to write the data incrementally to your local store (whether that be a database or file).

NOTE 1:

You can't use a GZIP encoding on the server side (explicitly or automatically) and still do streaming reads on the client. You'll either have to explicitly configure the server to only serve the JSON uncompressed, or instead you can simply use the Accept-Encoding: header to ask the server not to send you compressed content (ask for a non-existent compression scheme like dont-compress-please).

NOTE 2:

You can't do this kind of HTTP request (yet) from 'content' JavaScript (i.e: JavaScript content served by the web-server). You can only do this from 'privileged' JavaScript (e.g: inside a Firefox addon).

Sandboxed HTTP requests

I won't include an example of making the request itself (it's pretty big), but you can basically just cut & paste the first (simplest) example from the Creating Sandboxed HTTP Connections page on MDN.

In the example, in the onDataAvailable callback they just add the read text to a buffer (string) like this:

this.mData += scriptableInputStream.read(aLength);

Instead, you want to have your pull parser 'tokenize' only the newly read text and emit events for any fully parsed parts of the data read so far (and not build a string in memory representing the text read from the server).

JSON Pull Parser

A 'Pull parser' doesn't parse a string all at once (like say JSON.parse); instead you call its tokenize() method on each chunk of the source-text that you receive. The parser takes care of figuring out where it's up to in the input stream (using a "state machine") and when it begins or ends reading a so called 'atom' (for JSON e.g: start of an Array, a Number, end of an Array, start of an Object, etc) it will call a callback you supply (or emit an event) to tell you about it.

So in your case, say the JSON is basically an Array of Objects. You should basically ignore the 'start of Array' then for each 'end of Object' you write the data in single Object to your locale storage. That way, the biggest thing you've ever got in memory at one time is a single row/element from your larger file.

I found two JavaScript-based JSON Pull Parsers:

Benejson + an example for node.js (easily adapted to StreamListener)
Clarinet

You need to configure the parser of your choice to "listen" to the appropriate starts/ends in your source-text.

Privileged JavaScript

As the StreamListener interface is privileged, if you still want this web-app to mostly run off the web (and just have an offline mode) then it will need to be implemented in a Firefox addon.

Additionally, as the Pull Parser is called by the StreamListener callback, it too will need to be implemented as privileged JavaScript inside the addon.

The easiest way to do both of those is to implement a bootstrapped (a.k.a "restartless") Firefox addon which 'exports' the functionality you require 'into' your browser window so that your unprivileged 'content' JavaScript code can use it.

I've answered how to do this twice before (and there's a fair bit of code involved):

Large memory usage while loading big json to firefox

Answers (1)

Related Questions