Ailurus
Ailurus

Reputation: 731

Merge JSON files with identical structure into JSON file containing lists

I have some JSON files, all with identical structure (same keys everywhere, corresponding values might differ for some keys). I would like to collect the values associated with certain keys into lists and store those lists as the values associated with those keys in a new JSON file.

As example, consider these three files, where I'm interested in the key number_items and the corresponding values. First file —

[
  {
    "box_id": 1,
    "number_items": 4
  },
  {
    "box_id": 3,
    "number_items": 15
  },
  {
    "box_id": 6,
    "number_items": 2
  }
]

Second file —

[
  {
    "box_id": 1,
    "number_items": 7
  },
  {
    "box_id": 3,
    "number_items": 15
  },
  {
    "box_id": 6,
    "number_items": 4
  }
]

Third file —

[
  {
    "box_id": 1,
    "number_items": 5
  },
  {
    "box_id": 3,
    "number_items": 9
  },
  {
    "box_id": 6,
    "number_items": 0
  }
]

These should be merged into something that looks like this —

[
  {
    "box_id": 1,
    "number_items": [
      4,
      7,
      5
    ]
  },
  {
    "box_id": 3,
    "number_items": [
      15,
      15,
      9
    ]
  },
  {
    "box_id": 6,
    "number_items": [
      2,
      4,
      0
    ]
  }
]

Can this be done using jq? If not, what would be a good way to do this? Note that the actual scenario consists of 150+ files with 3 keys whose values I would like to merge into lists.

Upvotes: 1

Views: 4025

Answers (3)

Jeff Mercado
Jeff Mercado

Reputation: 134841

You can merge files with similar structures by simply passing them all in as input. Their contents will be streamed in in the order they are in.

Then you could just read them in to a single array, group the objects by the box_id then map out the results.

$ jq -n '
    [inputs[]] | group_by(.box_id)
        | map({box_id:.[0].box_id, number_items:map(.number_items)})
' input{1,2,3}.json

produces:

[
  {
    "box_id": 1,
    "number_items": [
      4,
      7,
      5
    ]
  },
  {
    "box_id": 3,
    "number_items": [
      15,
      15,
      9
    ]
  },
  {
    "box_id": 6,
    "number_items": [
      4,
      2,
      0
    ]
  }
]

It seems the order isn't preserved when items are grouped on some platforms. In my case, running on the Windows 64-bit version produces this. So be aware of that if you want to use group_by. There are of course other approaches you could take if you want to avoid using this filter, but this is much more convenient to use.

Upvotes: 2

peak
peak

Reputation: 116740

I would like to collect the values associated with certain keys

Here is a solution which treats all keys, except for the grouping key, in the same way. It also handles missing keys gracefully and does NOT depend on the stability of jq's sort. The solution is based on a generic filter, merge/0, defined as follows:

# Combine an array of objects into a single object, ans, with array-valued keys,
# such that for every key, k, in the i-th object of the input array, a,
# ans[k][i] = a[i][k]
# null is used as padding if a value is missing.
# Example:
# [{a:1, b:2}, {b:3, c:4}] | merge
# produces:
# {"a":[1,null],"b":[2,3],"c":[null,4]}
def merge:
  def allkeys: map(keys) | add | unique;
  allkeys as $allkeys
  | reduce .[] as $in ({};
     reduce $allkeys[] as $k (.;
      . + {($k): (.[$k] + [$in[$k]]) } ));

The solution to the given problem can then be formulated as:

transpose | map(merge) | map( .box_id |= .[0] )

Invocation:

  jq -s -f merge.jq input{1,2,3}.json

Output: as shown in the question.

More robust solution

The above solution assumes uniformity of the ordering by box_id within each file. This assumption seems warranted by the OP requirements, but for safety and robustness, the objects would first be sorted:

map(sort_by(.box_id)) | transpose | map( merge | (.box_id |= .[0]) )

Note that this still assumes that there are no missing values of box_id in any of the input files.

Still more robust solution

If there is a possibility that some of the box_id values might be missing in any of the input files, then it would be appropriate to add the missing values. This can be done with the help of the following filter:

# Input: a matrix of objects (that is, an array of rows of objects),
#   each of which is assumed to have a distinguished field, f,
#   with distinct values on each row;
# Output: a rectangular matrix such that every row, r, of the output
#   matrix includes the elements of the corresponding row of the input
#   matrix, with additional elements as necessary so that (r |
#   map(.id) | sort) is the same for all rows r.
#
def rectanglize(f):
  def ids: [.[][] | f] | unique;
  def it: . as $in | {} | (f = $in);
  ids as $ids
  | map( . + ( $ids - [.[]|f] | map(it) ) )
;  

Putting everything together, the main pipeline becomes:

rectanglize(.id)
| map(sort_by(.box_id))
| transpose 
| map( merge | .box_id |= .[0] )

Upvotes: 1

Elijah Brandyberry
Elijah Brandyberry

Reputation: 207

Depending on where you are trying to save this new file (local vs server), there are several different approaches. As far as I know, there is no possible way to save a file locally without using one of the available plugins (How to write data to a JSON file using Javascript). If you want to save it to a server, this is impossible with JavaScript, and best be done with a background language.

Here is a way to combine the content of several JSON files into your desired format.

// send json files you want combined, and a new file path and name (path/to/filename.json)
  function combineJsonFiles(files, newFileName) {
    var combinedJson = [];
    // iterate through each file 
    $.each(files, function(key, fileName) {
      // load json file
      // wait to combine until loaded. without this 'when().done()', boxes would return 'undefined'
      $.when(loadJsonFile(fileName)).done(function(boxes) {
        // combine json from file with combinedJson array
        combinedJson = combineJson(boxes, combinedJson);
        // check if this is the last file
        if (key == files.length-1) {
          // puts into json format
          combinedJson = JSON.stringify(combinedJson);
          // your json is now ready to be saved to a file
        }
      });
    });
  }

  function loadJsonFile(fileName) {
    return $.getJSON(fileName);
  }



function combineJson(boxes, combinedJson) {
  // iterate through each box 
  $.each(boxes, function(key, box) {
    // use grep to search if this box's id is already included
    var matches = $.grep(combinedJson, function(e) { return e.box_id == box.box_id; });

    // if there are no matches, add box to the combined file
    if (matches.length == 0) {

      var newBox = { box_id: box.box_id };

      // iterate through properties of box
      for (var property in box) {
        // check to ensure that properties are not inherited from base class
        if (box.hasOwnProperty(property)) {
          // will ignore if property is box_id
          if (property !== 'box_id') {
            // box is reformatted to make the property type into array
            newBox[property] = [box[property]];
          }
        }
      }
      combinedJson.push(newBox);
    } else {
      // select first match (there should never be more than one)
      var match = matches[0];

      // iterate through properties of box
      for (var property in box) {
        // check to ensure that properties are not inherited from base class
        if (box.hasOwnProperty(property)) {
          // will ignore if property is box_id
          if (property !== 'box_id') {
            // add property to the already existing box in the combined file
            match[property].push(box[property]);
          }
        }
      }
    }
  });
  return combinedJson;
}

  var jsonFiles = ['path/to/data.json', 'path/to/data2.json', 'path/to/data3.json'];

  combineJsonFiles(jsonFiles, 'combined_json.json');

The JSON output of this will look like:

[{"box_id":1,"number_items":[4,7,5]},{"box_id":3,"number_items":[15,15,9]},{"box_id":6,"number_items":[2,4,0]}]

Hope this helps!

Upvotes: 0

Related Questions