reuns
reuns

Reputation: 884

Loading large compressed json in html

I have a 20MB .json.gz file, when uncompressed it becomes 280MB, I'd like to obtain the corresponding Javascript object in my web page so that I can do some stuffs. Unfortunately it is now clear that standard methods are buggy when reaching the 256MB limit.

There are two versions of the file, v3 is 20MB and v2 is 16MB (uncompressed 280MB and 230MB)

For the v2 file a solution worked: using jQuery

  $.getJSON( "./data_package2.json.gz" , function( res ){    /* res contains the parsed object */ });

With the v3 file it now fails somewhere during parsing (hard to debug jQuery's code so I can't say more, also the error message depends on jQuery's version).

Concretely in this web page the v2 button works fine but not the v3.

I tried loading the compressed json files in Python and both of them work

    pip install compress_json
    python

        import compress_json
        D1 = compress_json.load("data_package3.json.gz")
        D1["case_data"][1]
            // it works fine ... even if it is using 800MB of RAM..

I'd like some help to understand what fails in jQuery's code and eventually to find a javascript zlib/JSON-parser code that will work on the v3 file.

Upvotes: 0

Views: 1477

Answers (3)

reuns
reuns

Reputation: 884

There is a code that works

<script src="uint8array-json-parser.min.js"></script>
<script>
var result;
function data() {   
    var req = new XMLHttpRequest(),req2 = new XMLHttpRequest();
    req.responseType = "arraybuffer"; 
    req.onload = function(oEvent) {
      var arrayBuffer = req.response;
      var data = new Uint8Array(arrayBuffer);
      result = JSON_parse(data); // from uint8array-json-parser.min.js
    };
    req.open("GET", "big.bin", true);
    req.send();
}
</script>

where uint8array-json-parser.min.js comes from this package, you can find it there.

big.bin is a json.gz file, which means you need to modify the .htaccess this way (when the file is on a Apache server you have access to):

<FilesMatch ".*\.bin$">
    <IfModule headers_module> 
      #some servers might need mod_headers.c instead of headers_module, and some others might refuse such a modification. Also sometimes there is a line deny override in the root folder's .htaccess
        Header set Content-Encoding "gzip"
        Header set Content-Type "binary/octet-stream"
    </IfModule>
</FilesMatch>

What is important is the Content-Encoding: gzip in the header telling the browser to decompress the downloaded file before serving it in req.response in javascript.

Be careful that chrome is adding Content-Encoding: gzip when it sees a .json.gz file whereas firefox is not.

The main problem with this approach is that uint8array-json-parser.min.js is a bit slower than JSON.parse. I would be interested to hear why JSON.parse doesn't accept a UInt8array (in V8 C++ it will just be a pointer cast), maybe JSON.parse prefers a string because it is immutable?

It is also possible to decompress the json.gz in javascript (with say pako.js) but again this is slower than the native implemention (we don't have access to, again I would be interested to hear why).

Upvotes: 1

B. Colin Tim
B. Colin Tim

Reputation: 331

I was able to parse your large test file using Firefox Developer Edition and my own WASM GZip library wasm-gzip:

import init, { decompressStringGzip } from "../wasm_gzip.js";

init().then(() => {
    fetch("./data_package3.json.gz")
        .then((response) => response.arrayBuffer())
        .then((buffer) => {
            let arr = new Uint8Array(buffer);
            console.log(arr);
            let decompressed = decompressStringGzip(arr);
            console.log(decompressed);
            let obj = JSON.parse(decompressed);
            console.log(obj);
        });
});

Firefox Developer Edition

Firefox Console Screenshot

Google Chrome

Chrome Console Screenshot


  • My x64-based PC has 16GiB RAM.
  • Google Chrome: Version 88.0.4324.104 (Official Build) (64-bit)
  • Firefox (Developer): 86.0b4 (64-bit)

I would recommend decompressing your file using a WebWorker, because the page is unresponsive while all the parsing and decompressing is done.

Upvotes: 2

reuns
reuns

Reputation: 884

A partial answer:

Splitting the big json in two files (10MB compressed each, uncompressed 150MB and 130MB) and merging the parsed objects in javascript works fine as shown there

So this is likely a problem of largest string size or similar.

I'm using windows 7 32bit and chrome, this might have an impact.

Should I try to open a ticket on jQuery since they know better at which step the big file is likely to fail?

Upvotes: 0

Related Questions