Reputation: 17483
I'm looking for a solution where I'm building out a JSON record and need to generate some text in JQ but pipe this text to an MD5 sum function and use it as a value for a key.
echo '{"first": "John", "last": "Big"}' | jq '. | { id: (.first + .last) | md5 }'
From looking at the manual and the GH issues I can't figure out how to do this since a function can't call out to a shell and there is not built in that provides a unique hash like functionality.
A better example what I'm looking for is this:
echo '{"first": "John", "last": "Big"}' | jq '. | {first, last, id: (.first + .last | md5) }'
to output:
{
"first": "John",
"last": "Big",
"id": "cda5c2dd89a0ab28a598a6b22e5b88ce"
}
and a little more context. I'm creating NDJson files for use with esbulk. I need to generate a unique key for each record. Initially, I thought piping out to the shell would be the simplest solution so I could either use sha1sum or some other hash function easily, but that is looking more challenging than I thought.
A better example what I'm looking for is this:
echo '[{"first": "John", "last": "Big"}, {"first": "Justin", "last": "Frozen"}]' | jq -c '.[] | {first, last, id: (.first + .last | md5) }'
to output:
{"first":"John","last":"Big","id":"cda5c2dd89a0ab28a598a6b22e5b88ce"}
{"first":"Justin","last":"Frozen","id":"af97f1bd8468e013c432208c32272668"}
Upvotes: 2
Views: 1556
Reputation: 5123
I adapted accepted answer's script to my case, posting it here, it could be useful to someone.
input.json:
{"date":100,"text":"some text","name":"april"}
{"date":200,"text":"a b c","name":"may"}
{"date":300,"text":"some text","name":"april"}
output.json:
{"date":100,"text":"some text","name":"april","id":"4d93d51945b88325c213640ef59fc50b"}
{"date":200,"text":"a b c","name":"may","id":"3da904d79fb03e6e3936ff2127039b1a"}
{"date":300,"text":"some text","name":"april","id":"4d93d51945b88325c213640ef59fc50b"}
The bash script to generate output.json:
cat input.json |
while read -r line ; do
jq -r '.text' <<< "$line" | md5 |
jq -c --argjson line "$line" -R '$line + {id: .}' \
>> output.json
done
Upvotes: 1
Reputation: 116957
Here is an efficient solution to the restated problem. There are altogether just two calls to jq, no matter the length of the array:
json='[{"first": "John", "last": "Big"}, {"first": "Justin", "last": "Frozen"}]'
echo "$json" |
jq -c '.[] | [.[]] | add' |
while read -r line ; do echo "$line" | md5 ; done |
jq -s -R --argjson json "$json" 'split("\n")
| map(select(length>0))
| . as $in
| reduce range(0;length) as $i ($json; .[$i].id = $in[$i])'
This produces an array. Just tack on |.[]
at the end to produce a stream of the elements.
Or a bit more tersely, with the goal of emitting one object per line without calling jq within the loop:
jq -c --slurpfile md5 <(jq -c '.[] | [.[]] | add' <<< "$json" |
while read -r line ; do printf '"%s"' $(md5 <<< "$line" ) ; done) \
'[., $md5] | transpose[] | .[0] + {id: .[1]}' <<< "$json"
I need to generate a unique key for each record.
It would therefore make sense to compute the digest based on each entire JSON object (or more generally, the entire JSON value), i.e. use jq -c ‘.[]’
Upvotes: 0
Reputation: 116957
Using tee
allows a pipeline to be used, e.g.:
echo '{"first": "John", "last": "Big"}' |
tee >( jq -r '.first + .last' | md5 | jq -R '{id: .}') |
jq -s add
Output:
{
"first": "John",
"last": "Big",
"id": "cda5c2dd89a0ab28a598a6b22e5b88ce"
}
The following uses a while
loop to iterate through the elements of the array, but it calls jq twice at each iteration. For a solution that does not call jq at all within the loop, see elsewhere on this page.
echo '[{"first": "John", "last": "Big"}, {"first": "Justin", "last": "Frozen"}]' |
jq -c .[] |
while read -r line ; do
jq -r '[.[]]|add' <<< "$line" | md5 |
jq --argjson line "$line" -R '$line + {id: .}'
done
Upvotes: 3
Reputation: 17483
Looking around a little farther I ended up finding this: jq json parser hash the field value which was helpful in getting to my answer of:
echo '[{"first": "John", "last": "Big"}, {"first": "Justin", "last": "Frozen"}]' > /tmp/testfile
jsonfile="/tmp/testfile"
jq -c .[] "$jsonfile" | while read -r jsonline ;
do
# quickly parse the JSON line and build the pre-ID out to get md5sum'd and then store that in a variable
id="$(jq -s -j -r '.[] | .first + .last' <<<"$jsonline" | md5sum | cut -d ' ' -f1)"
# using the stored md5sum'd ID we can use that as an argument for adding it to the existing jsonline
jq --arg id "$id" -s -c '.[] | .id = "\($id)"' <<<"$jsonline"
done
{"first":"John","last":"Big","id":"467ffeee8fea6aef01a6ffdcaf747782"}
{"first":"Justin","last":"Frozen","id":"fda76523d5259c0b586441dae7c2db85"}
Upvotes: 1
Reputation: 92884
jq
+ md5sum
trick:
json_data='{"first": "John", "last": "Big"}'
jq -r '.first + .last| @sh' <<<"$json_data" | md5sum | cut -d' ' -f1 \
| jq -R --argjson data "$json_data" '$data + {id: .}'
Sample output:
{
"first": "John",
"last": "Big",
"id": "f9e1e448a766870605b863e23d3fdbd8"
}
Upvotes: 0