rgov
rgov

Reputation: 4329

Batch processing of a large JSON file with jq

I have a large JSON file which is an array of many objects which I want to submit to an API that accepts bulk data uploads. I've learned that I can use jq's stream mode to avoid loading the entire file into memory:

jq --stream -nc 'fromstream(1|truncate_stream(inputs))' < data.json | curl ...

I'd like to batch this so I'm making a request in batches for, say, 100 objects at a time.

Upvotes: 2

Views: 1283

Answers (3)

Jeff Mercado
Jeff Mercado

Reputation: 134811

If your input is an array, then the paths will all start with a number (the index into the array). You could effectively paginate that array by filtering by the indices.

$ jq --stream -n --argjson skip 0 --argjson top 100 '
[fromstream(1|truncate_stream(
    inputs | . as [[$index]] | select($index >= $skip and $index < $skip + $top)
))]
' data.json | curl ...

Just set the skip argument to the appropriate offset.

I set up an example in the playground so you can play around with it. jqplay

Upvotes: 1

rgov
rgov

Reputation: 4329

I came up with this using the mapfile built-in from Bash 4:

while mapfile -n 100 LINES && ((${#LINES[@]})); do
    echo "Uploading ${#LINES[@]} records..."
    echo "${LINES[@]}" | curl --silent ... --data-binary @- >/dev/null
done < <(jq --stream -cn 'fromstream(1|truncate_stream(inputs))' < data.json)

Upvotes: 1

peak
peak

Reputation: 116680

You could use GNU parallel:

< data.json jq --stream -nc '
    fromstream(1|truncate_stream(inputs))' |
  parallel --pipe -N100 curl ...

Or more generically:

< data.json jq --stream -nc '
    fromstream( inputs|(.[0] |= .[1:]) | select(. != [[]]) )' |
  parallel --pipe -N100 curl ...

Upvotes: 2

Related Questions