developthou
developthou

Reputation: 363

How to join json files using jq?

I have the following json file

[
  {
    "clusterName": "cluster1",
    "nodes": [
      {
        "hostname": "server1",
        "dse": "6.7.5"
      },
      {
        "hostname": "server2",
        "dse": "6.7.5"
      }
    ]
  },
  {
    "clusterName": "cluster2",
    "nodes": [
      {
        "hostname": "server3",
        "dse": "6.7.5"
      },
      {
        "hostname": "server4",
        "dse": "6.7.5"
      }
    ]
  }
]

And I have another json

[
  {
    "hostname": "server1",
    "memorysize": "47.01 GiB",
    "processorcount": 12
  },
  {
    "hostname": "server2",
    "memorysize": "47.01 GiB",
    "processorcount": 12
  },
  {
    "hostname": "server3",
    "memorysize": "47.01 GiB",
    "processorcount": 10
  },
  {
    "hostname": "server4",
    "memorysize": "47.01 GiB",
    "processorcount": 11
  },
  {
    "hostname": "server5",
    "memorysize": "47.01 GiB",
    "processorcount": 12
  },
  {
    "hostname": "server6",
    "memorysize": "47.01 GiB",
    "processorcount": 12
  }
]

I want to join these two jsons to produce the following output

[
  {
    "clusterName": "cluster1",
    "nodes": [
      {
        "hostname": "server1",
        "dse": "6.7.5",
        "memorysize": "47.01 GiB",
        "processorcount": 12
      },
      {
        "hostname": "server2",
        "dse": "6.7.5",
        "memorysize": "47.01 GiB",
        "processorcount": 12
      }
    ]
  },
  {
    "clusterName": "cluster2",
    "nodes": [
      {
        "hostname": "server3",
        "dse": "6.7.5",
        "memorysize": "47.01 GiB",
        "processorcount": 10
      },
      {
        "hostname": "server4",
        "dse": "6.7.5",
        "memorysize": "47.01 GiB",
        "processorcount": 11
      }
    ]
  }
]

Basically the first file has the list of clusters dictionary. with nodes and I have a second file with the list of nodes dictionary.

The solution mentioned didn't work with multiple clusters.

Is there a better to do this in python instead?

Upvotes: 3

Views: 316

Answers (3)

peak
peak

Reputation: 116870

the invocation looks more like cat file1 file2 | jq ...

Here's a solution that assumes all the inputs are presented as a stream. This solution also avoids using the -s command-line option.

cat master.json hostnames.json | jq '
  # input: an array of objects, each with a "nodes" key
  def mergeNode($node): 
    map(if .hostname == $node.hostname then . + $node else . end);
  reduce inputs[] as $n (.; map_values( .nodes |= mergeNode($n) ))'

Notice that the -n command-line option has NOT been specified.

This solution also allows more than one "hostnames" file.

Upvotes: 0

developthou
developthou

Reputation: 363

I accomplished this using python instead

for cluster in clusters:
    for node in cluster["nodes"]:
        node.update(list(filter(lambda nodes: nodes['hostname'] == node['hostname'],nodes))[0])

Upvotes: 0

oliv
oliv

Reputation: 13259

A solution using jq:

<file1 jq --slurpfile f file2 '
  {
     clusterName:.[].clusterName,
     nodes:map($f[],.nodes)|add|group_by(.hostname)|map(add)
  }'

This build an object using both files.
The first field clusterName is taken from same field of the second file.
The second field nodes is the combination of objects of both based on hostname (done with group_by command)


A tentative answer to the below comment:
I don't think that -s has any advantages here since you need both files in memory (instead of 1 with --slurpfile).
In order not to play with indexes, the idea is to test if the field exists or not before using it. You can do this with ? and // operator. Together they from a sort of if not ... then .... Here is a possible solution:

jq -s '{
   clusterName:(.[][].clusterName?//empty),
   nodes:map(.[].nodes[]?//.[])|group_by(.hostname)|map(add)
}' file1 file2

As you can see, the difficulty in both scripts is to "normalize" the objects in order to perform the group_by operation.

Upvotes: 2

Related Questions