Reputation: 43
I'm using the jq tools (jq-json-processor) in shell script to parse json.
I've got 2 json files and want to merge them into one unique file
Here the content of files:
file1:
{"tag_id" : ["t1"], "inst_id" : "s1"}
{"tag_id" : ["t1"], "inst_id" : "s2"}
file2:
{"tag_id" : ["t2"], "inst_id" : "s1"}
{"tag_id" : ["t2"], "inst_id" : "s2"}
{"tag_id" : ["t2"], "inst_id" : "s3"}
expected result:
{"tag_id" : ["t1","t2"], "inst_id" : "s1"}
{"tag_id" : ["t1","t2"], "inst_id" : "s2"}
{"tag_id" : ["t2"], "inst_id" : "s3"}
Upvotes: 3
Views: 1965
Reputation: 116740
The following approach is very efficient in that:
(a) it takes advantage of the fact that file1.json and file2.json are streams of objects, thus avoiding the memory required to store these objects as arrays;
(b) it avoids sorting (as entailed, for example, by group_by
)
The key concept is the keywise-addition of objects. For performing keywise-addition of objects in a stream, we define the following generic function:
# s is assumed to be a stream of mutually
# compatible objects in the sense that, given
# any key of any object, the values at that key
# must be compatible w.r.t. `add`
def keywise_add(s):
reduce s as $x ({};
reduce ($x|keys_unsorted)[] as $k (.;
.[$k] += $x[$k]));
The task can now be accomplished as follows:
keywise_add(inputs | {(.inst_id): .tag_id} )
| keys_unsorted[] as $k
| {tag_id: .[$k], inst_id: $k}
With the above program in add.jq, the invocation:
jq -c -n -f add.jq file1.json file2.json
yields:
{"tag_id":["t1","t2"],"inst_id":"s1"}
{"tag_id":["t1","t2"],"inst_id":"s2"}
{"tag_id":["t2"],"inst_id":"s3"}
The above assumes that inst_id
is string-valued. If that is not the case, then the above approach can still be used so long as there are no collisions amongst inst_id|tostring
, which would be the case, for example, if inst_id
were always numeric.
Upvotes: 0
Reputation: 116740
Here's a join-like approach. It assumes your jq has INDEX/2
and supports the --slurpfile
command-line option. If your jq does not have these, now would be a good time to upgrade, though there are easy workarounds.
jq -n --slurpfile file1 file1.json -f join.jq file2.json
def join(s2; joinField; field):
INDEX(.[]; joinField)
| reduce s2 as $x (.;
($x|joinField) as $key
| if .[$key] then (.[$key]|field) += ($x|field)
else .[$key] = $x
end )
| .[]
;
$file1 | join(inputs; .inst_id; .tag_id)
Upvotes: 0
Reputation: 116740
One way is to use group_by
:
jq -n --slurpfile file1 file1.json --slurpfile file2 file2.json -f merge.jq
where merge.jq contains:
def sigma(f): reduce f as $x (null; . + $x);
$file1 + $file2
| group_by(.inst_id)[]
| {tag_id: sigma(.[].tag_id), inst_id: .[0].inst_id }
Upvotes: 1