Reputation: 363
I have the following json file
[
{
"clusterName": "cluster1",
"nodes": [
{
"hostname": "server1",
"dse": "6.7.5"
},
{
"hostname": "server2",
"dse": "6.7.5"
}
]
},
{
"clusterName": "cluster2",
"nodes": [
{
"hostname": "server3",
"dse": "6.7.5"
},
{
"hostname": "server4",
"dse": "6.7.5"
}
]
}
]
And I have another json
[
{
"hostname": "server1",
"memorysize": "47.01 GiB",
"processorcount": 12
},
{
"hostname": "server2",
"memorysize": "47.01 GiB",
"processorcount": 12
},
{
"hostname": "server3",
"memorysize": "47.01 GiB",
"processorcount": 10
},
{
"hostname": "server4",
"memorysize": "47.01 GiB",
"processorcount": 11
},
{
"hostname": "server5",
"memorysize": "47.01 GiB",
"processorcount": 12
},
{
"hostname": "server6",
"memorysize": "47.01 GiB",
"processorcount": 12
}
]
I want to join these two jsons to produce the following output
[
{
"clusterName": "cluster1",
"nodes": [
{
"hostname": "server1",
"dse": "6.7.5",
"memorysize": "47.01 GiB",
"processorcount": 12
},
{
"hostname": "server2",
"dse": "6.7.5",
"memorysize": "47.01 GiB",
"processorcount": 12
}
]
},
{
"clusterName": "cluster2",
"nodes": [
{
"hostname": "server3",
"dse": "6.7.5",
"memorysize": "47.01 GiB",
"processorcount": 10
},
{
"hostname": "server4",
"dse": "6.7.5",
"memorysize": "47.01 GiB",
"processorcount": 11
}
]
}
]
Basically the first file has the list of clusters dictionary. with nodes and I have a second file with the list of nodes dictionary.
The solution mentioned didn't work with multiple clusters.
Is there a better to do this in python instead?
Upvotes: 3
Views: 316
Reputation: 116870
the invocation looks more like cat file1 file2 | jq ...
Here's a solution that assumes all the inputs are presented as a stream. This solution also avoids using the -s command-line option.
cat master.json hostnames.json | jq '
# input: an array of objects, each with a "nodes" key
def mergeNode($node):
map(if .hostname == $node.hostname then . + $node else . end);
reduce inputs[] as $n (.; map_values( .nodes |= mergeNode($n) ))'
Notice that the -n command-line option has NOT been specified.
This solution also allows more than one "hostnames" file.
Upvotes: 0
Reputation: 363
I accomplished this using python instead
for cluster in clusters:
for node in cluster["nodes"]:
node.update(list(filter(lambda nodes: nodes['hostname'] == node['hostname'],nodes))[0])
Upvotes: 0
Reputation: 13259
A solution using jq
:
<file1 jq --slurpfile f file2 '
{
clusterName:.[].clusterName,
nodes:map($f[],.nodes)|add|group_by(.hostname)|map(add)
}'
This build an object using both files.
The first field clusterName
is taken from same field of the second file.
The second field nodes
is the combination of objects of both based on hostname
(done with group_by
command)
A tentative answer to the below comment:
I don't think that -s
has any advantages here since you need both files in memory (instead of 1 with --slurpfile
).
In order not to play with indexes, the idea is to test if the field exists or not before using it. You can do this with ?
and //
operator. Together they from a sort of if not ... then ...
.
Here is a possible solution:
jq -s '{
clusterName:(.[][].clusterName?//empty),
nodes:map(.[].nodes[]?//.[])|group_by(.hostname)|map(add)
}' file1 file2
As you can see, the difficulty in both scripts is to "normalize" the objects in order to perform the group_by
operation.
Upvotes: 2