kishore
kishore

Reputation: 541

Merging several json files with a common array element

I have several groups of json files where each group follows common pattern of data as below:

file 1:

{
  "projects": [
    {
      "id": 15658857,
      "code": "111"
    },
    {
      "id": 15623456,
      "code": "122"
    }
  ],
  "total_entries": 1391,
  "links": {
    "next": "https://api.xxx.com/projects?page=12&per_page=100",
    "last": "https://api.xxx.com/projects?page=14&per_page=100"
  }
}

file 2:

{
  "projects": [
    {
      "id": 15658857,
      "code": "211"
    }
  ],
  "total_entries": 2391,
  "links": {
    "next": "https://api.xxx.com/projects?page=22&per_page=100",
    "last": "https://api.xxx.com/projects?page=24&per_page=100"
  }
} 

File 3:

{
  "projects": [
    {
      "id": 15658857,
      "code": "311"
    },
    {
      "id": 15623456,
      "code": "322"
    },
    {
      "id": 13438719,
      "code": "333"
    }
  ],
  "total_entries": 3391,
  "links": {
    "next": "https://api.xxx.com/projects?page=32&per_page=100",
    "last": "https://api.xxx.com/projects?page=34&per_page=100"
  }
}

The above 3 files are sample files of a group and each file in this group has an array element "projects". Other groups have same structure but different array element name. I need to merge all the files of a group into a single file per group. The output of the above files is expected as:

{
  "projects": [
    {
      "id": 15658857,
      "code": "111"
    },
    {
      "id": 15623456,
      "code": "122"
    },
    {
      "id": 15658857,
      "code": "211"
    },
    {
      "id": 15658857,
      "code": "311"
    },
    {
      "id": 15623456,
      "code": "322"
    },
    {
      "id": 13438719,
      "code": "333"
    }
  ],
  "total_entries": 1391
}

I used the following jq code to achieve this.

jq -s ".[0].projects=([.[].projects]|flatten)|.[0] | del(.links)" file[123].json

But I am not happy with this as I have to hard code array element name "projects" in this case. I am looking for a solution where array element name doesn't need to be specified, so I can use that expression for every of similar content file. Thanks for the help.

Upvotes: 1

Views: 182

Answers (2)

peak
peak

Reputation: 116680

The following is essentially the same as @jq170727's solution, but packages the key abstraction into a function that may be worthy of your standard jq library:

# Gather by key all the values of the objects in a stream
def buckets(stream): reduce stream as $x ({};
  reduce ($x|keys_unsorted[]) as $key (.;
    .[$key] += [$x[$key]] ) );

With this in place, the solution becomes simply:

buckets(inputs) | map_values(add) | del(.links)

Standard Library

For example, if your standard jq library is in ~/.jq/jq/jq.jq then you could use the following one-liner:

jq -n 'include "jq"; buckets(inputs) | map_values(add) | del(.links)' file{1,2,3}.json

Addendum re: total_entries

OP asked:

what do I need to do if I don't want to add the total_entries from each of the file, I would like to take the value from the first file only

The following modification of the above program will use the first-encountered value for total_entries:

buckets(inputs)
| . as $buckets
| map_values(add)
| del(.links) + {total_entries: $buckets["total_entries"][0]}

Upvotes: 2

jq170727
jq170727

Reputation: 14635

Here is a possible solution assuming your sample data is in file1.json, file2.json and file3.json:

$ jq -Mn '
    reduce inputs as $i ({}; 
     reduce ($i|keys[]) as $k (.; .[$k] += $i[$k])) 
  | del(.links)
' file1.json file2.json file3.json
{
  "projects": [
    {
      "id": 15658857,
      "code": "111"
    },
    {
      "id": 15623456,
      "code": "122"
    },
    {
      "id": 15658857,
      "code": "211"
    },
    {
      "id": 15658857,
      "code": "311"
    },
    {
      "id": 15623456,
      "code": "322"
    },
    {
      "id": 13438719,
      "code": "333"
    }
  ],
  "total_entries": 7173
}

Note that this adds the values for total_entries from each file giving a different total then the one in the requested output.

Upvotes: 2

Related Questions