Reputation: 731
I have some JSON files, all with identical structure (same keys everywhere, corresponding values might differ for some keys). I would like to collect the values associated with certain keys into lists and store those lists as the values associated with those keys in a new JSON file.
As example, consider these three files, where I'm interested in the key number_items
and the corresponding values. First file —
[
{
"box_id": 1,
"number_items": 4
},
{
"box_id": 3,
"number_items": 15
},
{
"box_id": 6,
"number_items": 2
}
]
Second file —
[
{
"box_id": 1,
"number_items": 7
},
{
"box_id": 3,
"number_items": 15
},
{
"box_id": 6,
"number_items": 4
}
]
Third file —
[
{
"box_id": 1,
"number_items": 5
},
{
"box_id": 3,
"number_items": 9
},
{
"box_id": 6,
"number_items": 0
}
]
These should be merged into something that looks like this —
[
{
"box_id": 1,
"number_items": [
4,
7,
5
]
},
{
"box_id": 3,
"number_items": [
15,
15,
9
]
},
{
"box_id": 6,
"number_items": [
2,
4,
0
]
}
]
Can this be done using jq
? If not, what would be a good way to do this? Note that the actual scenario consists of 150+ files with 3 keys whose values I would like to merge into lists.
Upvotes: 1
Views: 4025
Reputation: 134841
You can merge files with similar structures by simply passing them all in as input. Their contents will be streamed in in the order they are in.
Then you could just read them in to a single array, group the objects by the box_id
then map out the results.
$ jq -n '
[inputs[]] | group_by(.box_id)
| map({box_id:.[0].box_id, number_items:map(.number_items)})
' input{1,2,3}.json
produces:
[
{
"box_id": 1,
"number_items": [
4,
7,
5
]
},
{
"box_id": 3,
"number_items": [
15,
15,
9
]
},
{
"box_id": 6,
"number_items": [
4,
2,
0
]
}
]
It seems the order isn't preserved when items are grouped on some platforms. In my case, running on the Windows 64-bit version produces this. So be aware of that if you want to use group_by
. There are of course other approaches you could take if you want to avoid using this filter, but this is much more convenient to use.
Upvotes: 2
Reputation: 116740
I would like to collect the values associated with certain keys
Here is a solution which treats all keys, except for the grouping key, in the same way. It also handles missing keys gracefully and does NOT depend on the stability of jq's sort
. The solution is based on a generic filter, merge/0
, defined as follows:
# Combine an array of objects into a single object, ans, with array-valued keys,
# such that for every key, k, in the i-th object of the input array, a,
# ans[k][i] = a[i][k]
# null is used as padding if a value is missing.
# Example:
# [{a:1, b:2}, {b:3, c:4}] | merge
# produces:
# {"a":[1,null],"b":[2,3],"c":[null,4]}
def merge:
def allkeys: map(keys) | add | unique;
allkeys as $allkeys
| reduce .[] as $in ({};
reduce $allkeys[] as $k (.;
. + {($k): (.[$k] + [$in[$k]]) } ));
The solution to the given problem can then be formulated as:
transpose | map(merge) | map( .box_id |= .[0] )
Invocation:
jq -s -f merge.jq input{1,2,3}.json
Output: as shown in the question.
The above solution assumes uniformity of the ordering by box_id
within each file. This assumption seems warranted by the OP requirements, but for safety and robustness, the objects would first be sorted:
map(sort_by(.box_id)) | transpose | map( merge | (.box_id |= .[0]) )
Note that this still assumes that there are no missing values of box_id
in any of the input files.
If there is a possibility that some of the box_id
values might be missing in any of the input files, then it would be appropriate to add the missing values. This can be done with the help of the following filter:
# Input: a matrix of objects (that is, an array of rows of objects),
# each of which is assumed to have a distinguished field, f,
# with distinct values on each row;
# Output: a rectangular matrix such that every row, r, of the output
# matrix includes the elements of the corresponding row of the input
# matrix, with additional elements as necessary so that (r |
# map(.id) | sort) is the same for all rows r.
#
def rectanglize(f):
def ids: [.[][] | f] | unique;
def it: . as $in | {} | (f = $in);
ids as $ids
| map( . + ( $ids - [.[]|f] | map(it) ) )
;
Putting everything together, the main pipeline becomes:
rectanglize(.id)
| map(sort_by(.box_id))
| transpose
| map( merge | .box_id |= .[0] )
Upvotes: 1
Reputation: 207
Depending on where you are trying to save this new file (local vs server), there are several different approaches. As far as I know, there is no possible way to save a file locally without using one of the available plugins (How to write data to a JSON file using Javascript). If you want to save it to a server, this is impossible with JavaScript, and best be done with a background language.
Here is a way to combine the content of several JSON files into your desired format.
// send json files you want combined, and a new file path and name (path/to/filename.json)
function combineJsonFiles(files, newFileName) {
var combinedJson = [];
// iterate through each file
$.each(files, function(key, fileName) {
// load json file
// wait to combine until loaded. without this 'when().done()', boxes would return 'undefined'
$.when(loadJsonFile(fileName)).done(function(boxes) {
// combine json from file with combinedJson array
combinedJson = combineJson(boxes, combinedJson);
// check if this is the last file
if (key == files.length-1) {
// puts into json format
combinedJson = JSON.stringify(combinedJson);
// your json is now ready to be saved to a file
}
});
});
}
function loadJsonFile(fileName) {
return $.getJSON(fileName);
}
function combineJson(boxes, combinedJson) {
// iterate through each box
$.each(boxes, function(key, box) {
// use grep to search if this box's id is already included
var matches = $.grep(combinedJson, function(e) { return e.box_id == box.box_id; });
// if there are no matches, add box to the combined file
if (matches.length == 0) {
var newBox = { box_id: box.box_id };
// iterate through properties of box
for (var property in box) {
// check to ensure that properties are not inherited from base class
if (box.hasOwnProperty(property)) {
// will ignore if property is box_id
if (property !== 'box_id') {
// box is reformatted to make the property type into array
newBox[property] = [box[property]];
}
}
}
combinedJson.push(newBox);
} else {
// select first match (there should never be more than one)
var match = matches[0];
// iterate through properties of box
for (var property in box) {
// check to ensure that properties are not inherited from base class
if (box.hasOwnProperty(property)) {
// will ignore if property is box_id
if (property !== 'box_id') {
// add property to the already existing box in the combined file
match[property].push(box[property]);
}
}
}
}
});
return combinedJson;
}
var jsonFiles = ['path/to/data.json', 'path/to/data2.json', 'path/to/data3.json'];
combineJsonFiles(jsonFiles, 'combined_json.json');
The JSON output of this will look like:
[{"box_id":1,"number_items":[4,7,5]},{"box_id":3,"number_items":[15,15,9]},{"box_id":6,"number_items":[2,4,0]}]
Hope this helps!
Upvotes: 0