Reputation: 4329
I have a large JSON file which is an array of many objects which I want to submit to an API that accepts bulk data uploads. I've learned that I can use jq's stream mode to avoid loading the entire file into memory:
jq --stream -nc 'fromstream(1|truncate_stream(inputs))' < data.json | curl ...
I'd like to batch this so I'm making a request in batches for, say, 100 objects at a time.
Upvotes: 2
Views: 1283
Reputation: 134811
If your input is an array, then the paths will all start with a number (the index into the array). You could effectively paginate that array by filtering by the indices.
$ jq --stream -n --argjson skip 0 --argjson top 100 '
[fromstream(1|truncate_stream(
inputs | . as [[$index]] | select($index >= $skip and $index < $skip + $top)
))]
' data.json | curl ...
Just set the skip argument to the appropriate offset.
I set up an example in the playground so you can play around with it. jqplay
Upvotes: 1
Reputation: 4329
I came up with this using the mapfile
built-in from Bash 4:
while mapfile -n 100 LINES && ((${#LINES[@]})); do
echo "Uploading ${#LINES[@]} records..."
echo "${LINES[@]}" | curl --silent ... --data-binary @- >/dev/null
done < <(jq --stream -cn 'fromstream(1|truncate_stream(inputs))' < data.json)
Upvotes: 1
Reputation: 116680
You could use GNU parallel:
< data.json jq --stream -nc '
fromstream(1|truncate_stream(inputs))' |
parallel --pipe -N100 curl ...
Or more generically:
< data.json jq --stream -nc '
fromstream( inputs|(.[0] |= .[1:]) | select(. != [[]]) )' |
parallel --pipe -N100 curl ...
Upvotes: 2