Reputation: 599

jq, split a huge json of array and save into file named with a value

i have a json containing an array of objects, every object contains a unique value in:

"id":"value"

i've followed this other answer and i can split the whole document in multiple files using jq and awk

jq -c ".[]" big.json | gawk '{print > "doc00" NR ".json";}'

in this way the output files are named sequentially.
how i can name the files using the id value?

Upvotes: 1

Answers (3)

peak

Reputation: 116919

Since the problem description indicates the input array is huge, it might be worth considering using jq's streaming parser. In general, this would be appropriate if the input JSON is too large to read into memory, or if reducing computer memory requirements is an important goal.

In brief, instead of invoking jq in the normal way, one adds the -n and --stream command-line options, and replaces the initial .[] by:

fromstream(1|truncate_stream(inputs))

Handling the splitting can then be done as described elsewhere on this page.

Upvotes: 1

peak

Reputation: 116919

Using .id as part of a filename is fraught with risk.

First, there is the potential problem of embedded newline characters.

Second, there is the problem of "reserved" characters, notably "/".

Third, Windows has numerous restrictions on file names -- see e.g. https://gist.github.com/doctaphred/d01d05291546186941e1b7ddc02034d3).

Also, if jq's -r option is used, as suggested in another posting on this page, then .id values of "1" and 1 will both be mapped to 1, which will result in loss of data if ">" is used in awk.

So here is a solution that illustrates how safety can be achieved in an OS X or *ix environment and that goes a long way towards a safe solution for Windows:

jq -c '.[]
       | (.id | if type == "number" then .
                else tostring | gsub("[^A-Za-z0-9-_]";"+") end), .' |
awk '
  function fn(s) { sub(/^\"/,"",s); sub(/\"$/,"",s); return s ".json"; }
  NR%2{f=fn($0); next} 
  {print >> f; close(f);}
'

Notice especially the use of ">>" to avoid losing data in the case of file name collisions.

Upvotes: 2

oguz ismail

Reputation: 50795

For each element in the array, print id and the element itself in two separate lines, thus you can grab the id from odd numbered lines and print even numbered lines to files named with id.

jq -cr '.[] | .id, .' big.json | awk 'NR%2{f=$0".json";next} {print >f;close(f)}'

Upvotes: 2

jq, split a huge json of array and save into file named with a value

Answers (3)

Related Questions