崇德方
崇德方

Reputation: 43

How to merge json file using jq?

I'm using the jq tools (jq-json-processor) in shell script to parse json.

I've got 2 json files and want to merge them into one unique file

Here the content of files:

file1:

{"tag_id" : ["t1"], "inst_id" : "s1"}
{"tag_id" : ["t1"], "inst_id" : "s2"}

file2:

{"tag_id" : ["t2"], "inst_id" : "s1"}
{"tag_id" : ["t2"], "inst_id" : "s2"}
{"tag_id" : ["t2"], "inst_id" : "s3"}

expected result:

{"tag_id" : ["t1","t2"], "inst_id" : "s1"}
{"tag_id" : ["t1","t2"], "inst_id" : "s2"}
{"tag_id" : ["t2"], "inst_id" : "s3"}

Upvotes: 3

Views: 1965

Answers (3)

peak
peak

Reputation: 116740

The following approach is very efficient in that:

(a) it takes advantage of the fact that file1.json and file2.json are streams of objects, thus avoiding the memory required to store these objects as arrays;

(b) it avoids sorting (as entailed, for example, by group_by)

The key concept is the keywise-addition of objects. For performing keywise-addition of objects in a stream, we define the following generic function:

# s is assumed to be a stream of mutually
# compatible objects in the sense that, given
# any key of any object, the values at that key
# must be compatible w.r.t. `add`
def keywise_add(s):
  reduce s as $x ({};
     reduce ($x|keys_unsorted)[] as $k (.; 
       .[$k] += $x[$k]));

The task can now be accomplished as follows:

keywise_add(inputs | {(.inst_id): .tag_id} )
| keys_unsorted[] as $k
| {tag_id: .[$k], inst_id: $k}

Invocation

With the above program in add.jq, the invocation:

jq -c -n -f add.jq file1.json file2.json

yields:

{"tag_id":["t1","t2"],"inst_id":"s1"}
{"tag_id":["t1","t2"],"inst_id":"s2"}
{"tag_id":["t2"],"inst_id":"s3"}

Caveat

The above assumes that inst_id is string-valued. If that is not the case, then the above approach can still be used so long as there are no collisions amongst inst_id|tostring, which would be the case, for example, if inst_id were always numeric.

Upvotes: 0

peak
peak

Reputation: 116740

Here's a join-like approach. It assumes your jq has INDEX/2 and supports the --slurpfile command-line option. If your jq does not have these, now would be a good time to upgrade, though there are easy workarounds.

Invocation

jq -n --slurpfile file1 file1.json -f join.jq file2.json

join.jq

def join(s2; joinField; field):
  INDEX(.[]; joinField) 
  | reduce s2 as $x (.;
      ($x|joinField) as $key
      | if .[$key] then (.[$key]|field) += ($x|field)
        else .[$key] = $x
      end )
  | .[]
  ;

$file1 | join(inputs; .inst_id; .tag_id)

Upvotes: 0

peak
peak

Reputation: 116740

One way is to use group_by:

jq -n --slurpfile file1 file1.json --slurpfile file2 file2.json -f merge.jq

where merge.jq contains:

def sigma(f): reduce f as $x (null; . + $x);

$file1 + $file2
| group_by(.inst_id)[]
| {tag_id: sigma(.[].tag_id), inst_id: .[0].inst_id }

Upvotes: 1

Related Questions