Reputation: 2237
I have huge (~7GB) json array of relatively small objects.
Is there relatively simple way to filter these objects without loading whole file into memory?
--stream option looks suitable, but I can't figure out how to fold stream of [path,value] to original objects.
Upvotes: 18
Views: 15078
Reputation: 116680
jq 1.5 has a streaming parser. The jq FAQ gives an example of how to convert a top-level array of JSON objects into a stream of its elements:
$ jq -nc --stream 'fromstream(1|truncate_stream(inputs))'
[{"foo":"bar"},{"foo":"baz"}]
{"foo":"bar"}
{"foo":"baz"}
That may be enough for your purposes, but it is worthwhile noting that setpath/2 can be helpful. Here's how to produce a stream of leaflets:
jq -c --stream '. as $in | select(length == 2) | {}|setpath($in[0]; $in[1])'
Further information and documentation is available in the jq manual: https://stedolan.github.io/jq/manual/#Streaming
Upvotes: 16