Reputation: 7820
I've scoured similar questions/answers trying to solve this issue over the past few days and I believe my amateur jq
skills are preventing me from solving this.
I'm trying to merge duplicate entries; for example... I'd like:
{
"Version": "2008-10-17",
"Id": "SomeBucketPolicy",
"Statement": [
{
"Sid": "Stmt1234567890987",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::726481726312:root"
},
"Action": [
"s3:GetBucketAcl",
"s3:GetBucketPolicy"
],
"Resource": "arn:aws:s3:::it-lab-test"
},
{
"Sid": "Stmt3423424566754",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::726481726312:root"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::it-lab-test/*"
},
{
"Sid": "SomeAPIUser",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::536415397313:user/SomeAPIUser"
},
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:GetObjectRetention"
],
"Resource": "arn:aws:s3:::it-lab-test/*"
},
{
"Sid": "SomeAPIUser",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::536415397313:user/SomeAPIUser"
},
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:GetObjectTagging"
],
"Resource": [
"arn:aws:s3:::it-lab-test/*",
"arn:aws:s3:::another-test-bucket/*",
"arn:aws:s3:::someother-test-bucket/*"
]
}
]
}
...to become:
{
"Version": "2008-10-17",
"Id": "SomeBucketPolicy",
"Statement": [
{
"Sid": "Stmt1234567890987",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::726481726312:root"
},
"Action": [
"s3:GetBucketAcl",
"s3:GetBucketPolicy"
],
"Resource": "arn:aws:s3:::it-lab-test"
},
{
"Sid": "Stmt3423424566754",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::726481726312:root"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::it-lab-test/*"
},
{
"Sid": "SomeAPIUser",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::536415397313:user/SomeAPIUser"
},
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:GetObjectRetention",
"s3:GetObjectTagging"
],
"Resource": [
"arn:aws:s3:::it-lab-test/*",
"arn:aws:s3:::another-test-bucket/*",
"arn:aws:s3:::someother-test-bucket/*"
]
}
]
}
I'd like it to be as flexible and forgiving as possible; if arrays need to be created for multiple entries, if there are duplicates within the nested objects they are merged properly as well, etc.
I've tried a variety of approaches, using a multitude of examples/techniques (grouping, mapping, using functions), but I can't come up with the results I'm looking for (I'm either left with the duplicate or the data is no longer present and not merged). The closest I've come has been through playing around with this solution Remove duplicate values from JSON with jq ... but having issues with dealing with multiple objects nested within the blocks. Any help would be appreciated in
Upvotes: 0
Views: 788
Reputation: 1293
With the permission of the OP, I'm publishing here an alternative solution for the JSON manipulation in the question, based on a walk-path
unix utility jtc
:
As I understand the question, it's required to merge "nicely" records containing SomeAPIUser
(if the term appears also outside of record, the walk-path could be easily enhanced). Here is a solution:
$ <file.json jtc -w'<SomeAPIUser>[-1]' -pmi'<SomeAPIUser>1:[-1]' |
jtc -w'<SomeAPIUser>[-2][:]<q>Q:' -p |
jtc -x'<SomeAPIUser>[-2]<>i:<>f[1]<>F' -y' ' -y'[0]' -s
This solution has three steps:
1. jtc -w'<SomeAPIUser>[-1]' -pmi'<SomeAPIUser>1:[-1]'
- here the record with the first occurrence of SomeAPIUser
is merged (recursively) with all the others (even if there are more than one)
2. jtc -w'<SomeAPIUser>[-2][:]<q>Q:' -p
- this step removes all the duplicate records that resulted from the merging in step 1.
3. jtc -x'<SomeAPIUser>[-2]<>i:<>f[1]<>F' -y' ' -y'[0]' -s
- in this final step, all arrays with a single JSON element (resulting from step 2), e.g.: Effect": [ "Allow" ]
, are converted into non-array records, like Effect": "Allow"
with the latest version of jtc
, this solution provides a more robust behavior:
$ <file.json jtc -w'[Sid]:<SomeAPIUser>[-1]' -pmi'[Sid]:<SomeAPIUser>1:[-1]' |
jtc -w'[Sid]:<>i>SomeAPIUser<[-2]<>i:><Q:' -p |
jtc -x'[Sid]:<>i>SomeAPIUser<[-2]<>i:<>f[1]<>F' -y' ' -y'[0]' -s
- it will confine searches of SomeAPIUser
only to "Sid"
labels (so it's resistant to case when clashing SomeAPIUser
may appear with other labels); plus, it will work also correctly when only one (or none) record containing "Sid":"SomeAPIUser"
is present in source JSON
PS. I'm the developer of the unix jtc
tool for JSON manipulations.
Upvotes: 2
Reputation: 116750
Since no specific requirements regarding the merge algorithm have been given, this response will focus instead on an architecture for solving the class of problems suggested by the question.
For the sake of illustration and specificity, though, a commutative, pairwise merge function will be defined as follows:
def merge(a; b):
def merge_objects($x;$y):
(($x|keys_unsorted) + ($y|keys_unsorted) | unique) as $keys
| reduce $keys[] as $k (null; . + {($k): merge($x[$k]; $y[$k])});
if a == b then a
elif a == null then b
elif b == null then a
elif (a|type) | (. == (b|type)) and (. == "object")
then merge_objects(a;b)
elif (a|type == "array") and (b|type) == "array"
then (a + b) | unique
elif (a|type == "array") then a + [b] | unique
elif (b|type == "array") then [a] + b | unique
else [a, b] | unique
end ;
With any such definition, we may now proceed to an answer:
# input is assumed to be an array of objects to be merged based on the filter f
def merge(f):
def merge: reduce .[] as $object (null; merge(.; $object));
group_by(f)
| map(merge) ;
.Statement |= merge(.Sid)
Upvotes: 3