Reputation: 319
I need to merge an array in a series of identically structured, nested JSON files that share the same higher level keys.
The goal is to create a merged file while retaining all of the existing, higher level keys and values.
File 1:
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"groups": [
{
"GroupId": "123456",
"GroupName": "foo"
},
{
"GroupId": "234567",
"GroupName": "bar"
}
]
}
]
}
]
}
File 2:
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
}
]
}
]
}
Expected result:
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"groups": [
{
"GroupId": "123456",
"GroupName": "foo"
},
{
"GroupId": "234567",
"GroupName": "bar"
}
]
},
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
}
]
}
]
}
I have tried the following based on answers to other questions of this type with no success:
jq -s '.[0] * .[1]' test1.json test2.json
jq -s add test1.json test2.json
jq -n '[inputs[]]' test{1,2}.json
The following successfully merges the array but lacks the higher level keys and values in the results.
jq -s '.[0].regions[0].services[0] * .[1].regions[0].services[0]' test1.json test2.json
I am assuming there is a simple jq solution to this that is escaping my searches. If not, any combination of jq and bash will work for a solution.
Upvotes: 3
Views: 2444
Reputation: 2222
combining jq add
and jq gives us:
jq '.hits.hits' logs.*.json | jq -s add
that will merge all the hits.hits array in all the logs.*.json files into one big array.
Upvotes: 0
Reputation: 14635
Here is a solution which converts arrays to objects down to the level of services, merges with *
and converts back to the array form. If file1
and file2
contain the sample data then this command:
$ jq -Mn --argfile file1 file1 --argfile file2 file2 '
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| {($a): {($r): $s}} # generate object for each account,region,service
# | debug # uncomment debug here to see stream
;
reduce merge as $x ({}; . * $x) # use '*' to recombine all the objects from merge
# | debug # uncomment debug here to see combined object
| keys[] as $a # for each key (account) of combined object
| {account:$a, regions:[ # construct object with {account, regions array}
.[$a] # for each account
| keys[] as $r # for each key (region) of account object
| {region:$r, services:[ # constuct object with {region, services array}
.[$r] # for each region
| keys[] as $s # for each service
| {($s): .[$s]} # generate service object
]} # add service objects to service array
]}' # add region object ot regions array
produces
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
},
{
"groups": [
{
"GroupId": "123456",
"GroupName": "foo"
},
{
"GroupId": "234567",
"GroupName": "bar"
}
]
}
]
}
]
}
Assembling this step by step provides a better picture of how this works. Start with just this filter
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| $a
;
merge
since there are two objects (one from file1 and one from file2) this outputs
the .account
from each:
"123456789012"
"123456789012"
Note that .account as $a
does not change the current value of .
.
Variables allow us to "drill down" into subobjects without losing higher
level context. Consider this filter:
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| [$a, $r]
;
merge
which outputs (account, region) pairs:
["123456789012","one"]
["123456789012","one"]
Now we can keep drilling down into services:
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[]
| [$a, $r, .]
;
merge
The third element of the array (.
) at that point refers to each
successive service in the .services
array, so this filter generates
["123456789012","one",{"groups":[{"GroupId":"123456","GroupName":"foo"},
{"GroupId":"234567","GroupName":"bar"}]}]
["123456789012","one",{"group_policies":[{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]},
{"GroupName":"bar","PolicyNames":["all_bar"]}]}]
This (complete) merge function:
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| {($a): {($r): $s}} # generate object for each account,region,service
;
merge
produces the stream
{"123456789012":{"one":{"groups":[{"GroupId":"123456","GroupName":"foo"},
{"GroupId":"234567","GroupName":"bar"}]}}}
{"123456789012":{"one":{"group_policies":[{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]},
{"GroupName":"bar","PolicyNames":["all_bar"]}]}}}
The important thing to observe is that these are objects which can easily be merged with *
by a reduce step:
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| {($a): {($r): $s}} # generate object for each account,region,service
;
reduce merge as $x ({}; . * $x) # use '*' to recombine all the objects from merge
reduce initializes its local state (.
) to {}
and then
computes a new state for each result from the merge function
by evaluating . * $x
, recursively combining the objects merge
built from $file1 and $file:
{"123456789012":{"one":{"groups":[{"GroupId":"123456","GroupName":"foo"},
{"GroupId":"234567","GroupName":"bar"}],
"group_policies":[{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]},
{"GroupName":"bar","PolicyNames":["all_bar"]}]}}}
Note that *
stopped merging at the array objects in the 'groups' and 'group_policies' keys.
Had we wanted to continue merging we could create more objects in the merge function. e.g.
consider this extension:
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| (
$s.groups[]? as $g
| {($a): {($r): {groups: {($g.GroupId): $g}}}}
), (
$s.group_policies[]? as $p
| {($a): {($r): {group_policies: {($p.GroupName): $p}}}}
)
;
merge
This merge goes deeper then the previous one, producing
{"123456789012":{"one":{"groups":{"123456":{"GroupId":"123456","GroupName":"foo"}}}}}
{"123456789012":{"one":{"groups":{"234567":{"GroupId":"234567","GroupName":"bar"}}}}}
{"123456789012":{"one":{"group_policies":{"foo":{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]}}}}}
{"123456789012":{"one":{"group_policies":{"bar":{"GroupName":"bar","PolicyNames":["all_bar"]}}}}}
What is important here is that the "groups" and "group_policies" keys contain objects which means in this filter
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| (
$s.groups[]? as $g
| {($a): {($r): {groups: {($g.GroupId): $g}}}}
), (
$s.group_policies[]? as $p
| {($a): {($r): {group_policies: {($p.GroupName): $p}}}}
)
;
reduce merge as $x ({}; . * $x)
the reduce *
will merge groups and group policies instead of overwriting them, generating:
{"123456789012":{"one":{"groups":{"123456":{"GroupId":"123456","GroupName":"foo"},
"234567":{"GroupId":"234567","GroupName":"bar"}},
"group_policies":{"foo":{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]},
"bar":{"GroupName":"bar","PolicyNames":["all_bar"]}}}}}
Putting this back into the original form will require a little more work, but not much:
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| (
$s.groups[]? as $g
| {($a): {($r): {groups: {($g.GroupId): $g}}}}
), (
$s.group_policies[]? as $p
| {($a): {($r): {group_policies: {($p.GroupName): $p}}}}
)
;
reduce merge as $x ({}; . * $x)
| keys[] as $a # for each key (account) of combined object
| {account:$a, regions:[ # construct object with {account, regions array}
.[$a] # for each account
| keys[] as $r # for each key (region) of account object
| {region:$r, services:[ # constuct object with {region, services array}
.[$r] # for each region
| {groups: [.groups[]]} # add groups to service
, {group_policies: [.group_policies[]]} # add group_policies to service
]}
]}
Now with this version suppose our file2 contains a group as well as group_policies. e.g
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"groups": [
{
"GroupId": "999",
"GroupName": "baz"
}
]
},
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
}
]
}
]
}
Where the first version of this solution produced
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
},
{
"groups": [
{
"GroupId": "999",
"GroupName": "baz"
}
]
}
]
}
]
}
This revised version produces
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"groups": [
{
"GroupId": "123456",
"GroupName": "foo"
},
{
"GroupId": "234567",
"GroupName": "bar"
},
{
"GroupId": "999",
"GroupName": "baz"
}
]
},
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
}
]
}
]
}
Upvotes: 2