Ger Cas
Ger Cas

Reputation: 2298

How to tabulate nested JSON file with jq

I have the following JSON file that I would like to parse with jq tool that someone suggested me but I'm new with it. There are 3 parents nodes with the same children names. The parent nodes are MNRs, GNRs and MSNRs and each of them has children named N1, N2, NR_i, NR_f.

{
  "Main": {
    "Document": "Doc.1",
    "Cini": "DDFR",
    "List": {
      "SubList": {
        "CdTa": "ABC",
        "NN": "XYZ",
        "ND": {
          "RiS": {
            "RiN": {
              "NSE14": {
                "MNRs": {
                  "MRD": [
                    {
                      "NR": {
                        "N1": "393",
                        "N2": "720",
                        "SNR": {
                          "NR_i": "203",
                          "NR_f": "49994"
                        }
                      }
                    },
                    {
                      "NR": {
                        "N1": "687",
                        "N2": "345",
                        "SNR": {
                          "NR_i": "55005",
                          "NR_f": "1229996"
                        }
                      }
                    }
                  ]
                },
                "GNRs": {
                  "RD": {
                    "NR": {
                      "N1": "649",
                      "N2": "111",
                      "SNR": {
                        "NR_i": "55400",
                        "NR_f": "877"
                      }
                    }
                  }
                },
                "MSNRs": {
                  "NR": [
                    {
                      "N1": "748",
                      "N2": "5624",
                      "SNR": {
                        "NR_i": "8746",
                        "NR_f": "7773"
                      }
                    },
                    {
                      "N1": "124",
                      "N2": "54",
                      "SNR": {
                        "NR_i": "8847",
                        "NR_f": "5526"
                      }
                    }
                  ]
                }
              },
              "NSE12": {
                "MBB": "990",
                "MRB": "123"
              },
              "MGE13": {
                "TBB": "849",
                "TRB": "113"
              }
            }
          }
        }
      }
    }
  }
}

With this code I get the following

.Main.List.SubList.ND.RiS.RiN.NSE14.MNRs.MRD

[
  {
    "NR": {
      "N1": "393",
      "N2": "720",
      "SNR": {
        "NR_i": "203",
        "NR_f": "49994"
      }
    }
  },
  {
    "NR": {
      "N1": "687",
      "N2": "345",
      "SNR": {
        "NR_i": "55005",
        "NR_f": "1229996"
      }
    }
  }
]

And with these commands I get the a columns of individual values for each children and others null.

.. | .N1?
.. | .N2?
.. | .NR_i?
.. | .NR_f?

I'm far from my desired output since I'd like to extract the children for each parent and tabulate in the form below.

+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
|             MNRs              |          GNRs            |           MSNRs          |
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
| N1   | N2   | NR_i  | NR_f    | N1  | N2  | NR_i  | NR_f | N1  | N2   | NR_i | NR_f |
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
| 393  | 720  | 203   | 49994   | 649 | 111 | 55400 | 877  | 748 | 5624 | 8746 | 7773 |
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+
| 687  | 345  | 55005 | 1229996 |     |     |       |      | 124 | 54   | 8847 | 5526 |
+------+------+-------+---------+-----+-----+-------+------+-----+------+------+------+ 

May someone help me with this. Thanks in advance.

Upvotes: 0

Views: 419

Answers (1)

peak
peak

Reputation: 116790

Since the nature of the input JSON has only been given by example, let's begin by defining a filter for linearizing .NR:

# Produce a stream of arrays
def linearize:
  if type == "array" then .[] | linearize
  else [ .N1, .N2, .SNR.NR_i, .SNR.NR_f]
  end;

The relevant data can now be extracted while preserving the top-level groups as follows:

.Main.List.SubList.ND.RiS.RiN.NSE14
| [to_entries[]
| [.key]
  + [.value | .. | objects | select(has("NR")) | .NR | [ linearize ]] ]

Because the input JSON is not uniform, it will help to ensure uniformity by augmenting the above pipeline with the following mapping:

| map(if length > 2 then [.[0], [.[1:][][]]] else . end)

This produces a single JSON array structured like this:

[["MNRs",[["393","720","203","49994"]],[["687","345","55005","1229996"]]],
 ["GNRs", ...

To obtain the first data row of the table from this intermediate result, it will be worthwhile defining a function that will provide the necessary padding:

def row($i; $padding):
  . as $in
  | [range(0;$padding) | null] as $nulls
  | reduce range(0; length) as $ix 
      ([]; . + ($in[$ix][1][$i] // $nulls));

Now the first data row can be obtained by row(0;4), the second by row(1;4), etc.

The total number of data rows would be given by filtering the intermediate data structure through map(.[1] | length) | max; thus, the data rows can be obtained by tacking the following onto the previous pipeline:

| (map(.[1] | length) | max) as $rows
| range(0; $rows) as $r
| row($r; 4)
| @tsv

Using the -r command-line option and the given sample, the output would be:

393 720 203 49994   649 111 55400   877 748 5624    8746    7773
687 345 55005   1229996                 124 54  8847    5526

Adding the headers is left as an exercise :-)

Upvotes: 3

Related Questions