Reputation: 48
Given a JSON file of the format:
{
"key00" : {
"key10" : {
"20170405" : {
"val0" : 10,
...
"valn" : 12
},
"20170404" : {
"val0" : 5,
...
"valn" : 43
},
...
},
"key11" : {...},
...
},
"key01" : {...},
"key02" : {...},
...
}
I want to use jq
to decompose the tree into a flattened list whose format is such as below. This procedure should select one particular key in the hierarchy, the date, and for every instance of that date in the tree merge that date's values while making their key's unique based on the value's location in the tree:
[
{
"date" : "20170405",
"key00.key10.val0" : 10,
...
"key00.key10.valn" : 12
},
{
"date" : "20170404",
"key00.key10.val0" : 10,
...
"key00.key10.valn" : 12
},
...
{
"date" : "20170403",
"key0n.key1n.val0" : 10,
...
"key0n.key1n.valn" : 12
},
]
Knowing the nesting structure, assuming it is rigid, I have performed this with a set of for-loops in Perl. But if the structure changes, the program breaks. Also, for every level of hierarchy, I need a for-loop. How would you go about traversing this tree recursively using jq's language?
(I want to use jq because I am already using it to merge many files together of the format in the first code listing, so I figure I could build upon that. The merge is simple: jq -s 'reduce .[] as $x ({}, . * $x)' *.json > merged.json
)
Upvotes: 1
Views: 1232
Reputation: 14635
Here is solution using to_entries to get at the data, put it into a form setpath will accept, group_by to organize by date and reduce with setpath to build the final form.
You can see how this works step by step. First start with
to_entries
| .[]
| .key as $k1
| ( .value | to_entries
| $k1, .[] )
to get to the first key. With my test data that gives me
"key00"
{
"key": "key10",
"value": {
"20170405": {
"val0": 10,
"valn": 12
},
"20170404": {
"val0": 5,
"valn": 43
}
}
}
"key01"
...
then drill down a little more to get to the next key
to_entries
| .[]
| .key as $k1
| ( .value | to_entries
| .[]
| .key as $k2
| ( .value | to_entries
| $k1, $k2, .[] ) )
which gives
"key00"
"key10"
{
"key": "20170405",
"value": {
"val0": 10,
"valn": 12
}
}
{
"key": "20170404",
"value": {
"val0": 5,
"valn": 43
}
}
"key01"
"key11"
...
then a little more to get to the date and the final value
to_entries
| .[]
| .key as $k1
| ( .value | to_entries
| .[]
| .key as $k2
| ( .value | to_entries
| .[]
| .key as $d
| ( .value | to_entries
| .[]
| [$d, [$k1, $k2, .key], .value] ) ) )
Now that we have
[
"20170405",
[
"key00",
"key10",
"val0"
],
10
]
[
"20170405",
[
"key00",
"key10",
"valn"
],
12
]
...
put that back into an array and use group_by, reduce and setpath
[
to_entries
| .[]
| .key as $k1
| ( .value | to_entries
| .[]
| .key as $k2
| ( .value | to_entries
| .[]
| .key as $d
| ( .value | to_entries
| .[]
| [$d, [$k1, $k2, .key], .value]
)
)
)
]
| group_by(.[0])
| .[]
| .[0][0] as $d
| reduce .[] as $e (
{date:$d}
; setpath([$e[1] | join(".")]; $e[2])
)
to get the final answer
{
"date": "20170404",
"key00.key10.val0": 5,
"key00.key10.valn": 43
}
{
"date": "20170405",
"key01.key11.val1": 1,
"key00.key10.valn": 12,
"key00.key10.val0": 10,
"key01.key11.val2": 2
}
{
"date": "20170406",
"key01.key11.val0": 0,
"key01.key11.val9": 9
}
...
Upvotes: 2
Reputation: 116680
This may seem like a rough ride, but to make things a bit more comprehensible, let's start with a helper function which will essentially make a JSON object when given an array of strings and a value, by creating a key from the array using a join character:
# input: [arrayOfStrings, value]
def squish(joinchar): { (.[0] | join(joinchar)): .[1] };
For example, [["a","b"], 10] | squish(".")
emits {"a.b", 10}
The remainder of the solution to the problem is based on the built-in filters paths
and group_by
, which are documented elsewhere, but in brief, paths
emits a stream of string arrays representing the paths; the associated value is then tacked on. The [path, value] arrays are then grouped by date using group_by
. Finally, the results are formatted according to the requirements.
. as $in
| [paths
| select(length==4)
| . as $path
| [ $path, ($in|getpath($path)) ] ]
| group_by( .[0][2] | tonumber ) # sort by numeric value
| map( {date: .[0][0][2] }
+ ( map( del(.[0][2]) | squish(".") ) | add) )
The above solution groups paths globally by date, which seems to be in accordance with the requirements except for the sample output data.
The select(length==4)
criterion that has been used above may have to be modified if the data differs from the sample given.
Upvotes: 1