Compression method for short JSON objects

Question

I'm going to store some JSON in a redis instance. None of the JSON objects will be very long - think a single object per redis key with maybe 20 sets of attribute-value pairs on each JSON object.

Something along the lines of:

Key 1:

{
 "id": "de305d54-75b4-431b-adb2-eb6b9e546011",
 "email": "joe.bloggs@gmail.com",
 "telephone": "01234567890",
 "age": 18
}

Key 2:

{
 "id": "de305d54-75b4-431b-adb2-eb6b9e546012",
 "email": "john.doe@gmail.com",
 "telephone": "01234567890",
 "age": 19
}

There will be millions of entries like this.

About 12 of the attribute names are going to be static, the rest will vary and I'm expecting most of the values to vary (though some may be true/false, a few low integers that might match and maybe something with similarities to domain names).

Is there a suitable compression algorithm, ideally with a Java implementation, which would be ideal for this sort of data? Perhaps something where I can supply a static dictionary rather than something like LZW's attempt to learn from each piece of data?

rich · Accepted Answer

This looks like what I was after:

http://docs.oracle.com/javase/7/docs/api/java/util/zip/Deflater.html#setDictionary(byte[])

After Boris the Spider's comment I intend on trying the HUFFMAN_ONLY option, but haven't yet. FWIW with a single sample of test data I'm seeing the compressed byte[] be ~20% of the original.

Compression method for short JSON objects

Answers (2)

Related Questions