Reputation: 2298
I have the following JSON file that I would like to parse with jq tool that someone suggested me but I'm new with it. There are 3 parents nodes with the same children names. The parent nodes are MNRs, GNRs and MSNRs and each of them has children named N1, N2, NR_i, NR_f.
{
"Main": {
"Document": "Doc.1",
"Cini": "DDFR",
"List": {
"SubList": {
"CdTa": "ABC",
"NN": "XYZ",
"ND": {
"RiS": {
"RiN": {
"NSE14": {
"MNRs": {
"MRD": [
{
"NR": {
"N1": "393",
"N2": "720",
"SNR": {
"NR_i": "203",
"NR_f": "49994"
}
}
},
{
"NR": {
"N1": "687",
"N2": "345",
"SNR": {
"NR_i": "55005",
"NR_f": "1229996"
}
}
}
]
},
"GNRs": {
"RD": {
"NR": {
"N1": "649",
"N2": "111",
"SNR": {
"NR_i": "55400",
"NR_f": "877"
}
}
}
},
"MSNRs": {
"NR": [
{
"N1": "748",
"N2": "5624",
"SNR": {
"NR_i": "8746",
"NR_f": "7773"
}
},
{
"N1": "124",
"N2": "54",
"SNR": {
"NR_i": "8847",
"NR_f": "5526"
}
}
]
}
},
"NSE12": {
"MBB": "990",
"MRB": "123"
},
"MGE13": {
"TBB": "849",
"TRB": "113"
}
}
}
}
}
}
}
}
With this code I get the following
.Main.List.SubList.ND.RiS.RiN.NSE14.MNRs.MRD
[
{
"NR": {
"N1": "393",
"N2": "720",
"SNR": {
"NR_i": "203",
"NR_f": "49994"
}
}
},
{
"NR": {
"N1": "687",
"N2": "345",
"SNR": {
"NR_i": "55005",
"NR_f": "1229996"
}
}
}
]
And with these commands I get the a columns of individual values for each children and others null.
.. | .N1?
.. | .N2?
.. | .NR_i?
.. | .NR_f?
I'm far from my desired output since I'd like to extract the children for each parent and tabulate in the form below.
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
| MNRs | GNRs | MSNRs |
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
| N1 | N2 | NR_i | NR_f | N1 | N2 | NR_i | NR_f | N1 | N2 | NR_i | NR_f |
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
| 393 | 720 | 203 | 49994 | 649 | 111 | 55400 | 877 | 748 | 5624 | 8746 | 7773 |
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
| 687 | 345 | 55005 | 1229996 | | | | | 124 | 54 | 8847 | 5526 |
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
May someone help me with this. Thanks in advance.
Upvotes: 0
Views: 419
Reputation: 116790
Since the nature of the input JSON has only been given by example, let's begin by defining a filter for linearizing .NR
:
# Produce a stream of arrays
def linearize:
if type == "array" then .[] | linearize
else [ .N1, .N2, .SNR.NR_i, .SNR.NR_f]
end;
The relevant data can now be extracted while preserving the top-level groups as follows:
.Main.List.SubList.ND.RiS.RiN.NSE14
| [to_entries[]
| [.key]
+ [.value | .. | objects | select(has("NR")) | .NR | [ linearize ]] ]
Because the input JSON is not uniform, it will help to ensure uniformity by augmenting the above pipeline with the following mapping:
| map(if length > 2 then [.[0], [.[1:][][]]] else . end)
This produces a single JSON array structured like this:
[["MNRs",[["393","720","203","49994"]],[["687","345","55005","1229996"]]],
["GNRs", ...
To obtain the first data row of the table from this intermediate result, it will be worthwhile defining a function that will provide the necessary padding:
def row($i; $padding):
. as $in
| [range(0;$padding) | null] as $nulls
| reduce range(0; length) as $ix
([]; . + ($in[$ix][1][$i] // $nulls));
Now the first data row can be obtained by row(0;4), the second by row(1;4), etc.
The total number of data rows would be given by filtering the intermediate data structure through map(.[1] | length) | max
; thus, the data rows can be obtained by tacking the following onto the previous pipeline:
| (map(.[1] | length) | max) as $rows
| range(0; $rows) as $r
| row($r; 4)
| @tsv
Using the -r command-line option and the given sample, the output would be:
393 720 203 49994 649 111 55400 877 748 5624 8746 7773
687 345 55005 1229996 124 54 8847 5526
Adding the headers is left as an exercise :-)
Upvotes: 3