Reputation: 37404
I found myself in the world of JSON and I'm trying to convert out of it using jq
. I'm trying to convert following structure to CSV:
{
"Action": "A1",
"Group": [
{
"Id": "10",
"Units": [
"1"
]
}
]
}
{
"Action": "A2",
"Group": [
{
"Id": "11",
"Units": [
"2"
]
},
{
"Id": "20",
"Units": []
}
]
}
{
"Action": "A1",
"Group": [
{
"Id": "26",
"Units": [
"1",
"3"
]
}
]
}
{
"Action": "A3",
"Group": null
}
where the Ids are between 10-99 and Units 1-5. Expected output would be (quoted or unquoted, comma separated or not, I used pipe separators for clarity):
Action|Group|Unit1|Unit2|Unit3|Unit4|Unit5
A1|10|1|0|0|0|0
A2|11|0|1|0|0|0
A2|20|0|0|0|0|0
A1|26|1|0|1|0|0
A3|0|0|0|0|0|0
I've played around with this for a while now (history | grep jq | wc -l
says 107) but haven't made any real progress to combining the keys with eachother, I'm basically just getting lists of keys (jq
n00b).
Update:
Testing the solution (sorry, been a bit s l o w) I noticed that the data also has records with "Group": null
s, ie.:
{
"Action": "A3",
"Group": null
}
(above few lines added to the main test data set) which results in error: jq: error (at file.json:61): Cannot iterate over null (null)
. Expected output would be:
A3|0|0|0|0
Is there an easy way out of that one?
Upvotes: 2
Views: 1642
Reputation: 116730
Here is a solution with minimal memory requirements for the case where the number of "Unit" columns (n) is unknown beforehand. In the first pass, n is computed.
This is for the second pass:
# Output: a stream of arrays.
def synopsis:
inputs
| .Action as $a
| .Group |= (. // [{Id:0, Units:[]}])
| .Group[]
| [$a, .Id, (.Units|map(tonumber-1))];
def h(n): ["Action", "Group", "Unit\(range(1;n+1))"];
# Output: an array suitable for @csv
def stream(n):
def i: reduce .[] as $i ([range(0;n)|0]; .[$i]=1);
.[0:2] + (.[2] | i) ;
h($width), (synopsis | stream($width)) | @csv
jq -rn --argjson width $(jq -n '
[inputs|(.Group//[{Units:[]}])[]|.Units|map(tonumber)|max]|max
' data.json) -f stream.jq data.json
This is the output with the "null" record ({"Action": "A3","Group": null}) appended:
"Action","Group","Unit1","Unit2","Unit3"
"A1","10",1,0,0
"A2","11",0,1,0
"A2","20",0,0,0
"A1","26",1,0,1
"A3",0,0,0,0
Upvotes: 1
Reputation: 14655
Here is a general solution if the set of unit columns isn't known in advance:
def normalize: [ # convert input to array of flattened objects e.g.
inputs # [{"Action":"A1","Group":"10","Unit1":"1"}, ...]
| .Action as $a
| .Group[]
| {Action:$a, Group:.Id}
+ reduce .Units[] as $u ({};.["Unit\($u)"]="1")
];
def columns: # compute column names
[ .[] | keys[] ] | unique ;
def rows($names): # generate row arrays
.[] | [ .[$names[]] ] | map( .//"0" );
normalize | columns as $names | $names, rows($names) | join("|")
Sample Run (assumes filter in filter.jq
and data in data.json
)
$ jq -Mnr -f filter.jq data.json
Action|Group|Unit1|Unit2|Unit3
A1|10|1|0|0
A2|11|0|1|0
A2|20|0|0|0
A1|26|1|0|1
In this specific problem the ordering done by unique
matches the column output we want. If that were not the case columns
would be more complicated.
Much of the complexity comes from dealing with not knowing the final set of Unit columns. If the unit set is fixed and reasonably small (e.g. 1-5) a simpler filter can be used:
["\(1+range(5))"] as $units
| ["Action", "Group", "Unit\($units[])"]
, ( inputs
| .Action as $a
| .Group[]
| [$a, .Id, (.Units[$units[]|[.]] | if .!=[] then "1" else "0" end) ]
) | join("|")
Sample Run
$ jq -Mnr '["\(1+range(5))"] as $units | ["Action", "Group", "Unit\($units[])"], (inputs | .Action as $a | .Group[] | [$a, .Id, (.Units[$units[]|[.]] | if .!=[] then "1" else "0" end) ] ) | join("|")' data.json
Action|Group|Unit1|Unit2|Unit3|Unit4|Unit5
A1|10|1|0|0|0|0
A2|11|0|1|0|0|0
A2|20|0|0|0|0|0
A1|26|1|0|1|0|0
Try it online at tio.run or at jqplay.org
To handle the case where Group
may be null
the easiest way is to use a variation of peak's suggestion. E.g
["\(1+range(5))"] as $units
| ["Action", "Group", "Unit\($units[])"]
, ( inputs
| .Action as $a
| ( .Group // [{Id:"0", Units:[]}] )[] # <-- supply default group if null
| [$a, .Id, (.Units[$units[]|[.]] | if .!=[] then "1" else "0" end) ]
) | join("|")
Try it online at tio.run or jqplay.org
Upvotes: 3
Reputation: 116730
This is for the case where the number of "Unit" columns (n) is unknown beforehand. It avoids reading in the entire file at once, and proceeds in three main steps: the relevant information is collected in a compact form by "synopsis"; n is computed; and the full rows are formed.
For simplicity, the following is for jq version 1.5 or later, and uses @csv
. Small tweaks might be needed if jq 1.4 is used, depending on detailed requirements regarding the output.
jq -nr -f tocsv.jq input.json
# Input: a stream of JSON objects.
# Output: a stream of arrays.
def synopsis:
inputs
| .Action as $a
| .Group[]
| [$a, .Id, (.Units|map(tonumber-1))];
# Input: an array of arrays
# Output: a stream of arrays suitable for @csv
def stream:
def h(n): ["Action", "Group", "Unit\(range(1;n+1))"];
def i(n): reduce .[] as $i ([range(0;n)|0]; .[$i]=1);
(map(.[2] | max) | max + 1) as $n
| h($n),
(.[] | .[0:2] + (.[2] | i($n)))
;
[synopsis] | stream | @csv
"Action","Group","Unit1","Unit2","Unit3"
"A1","10",1,0,0
"A2","11",0,1,0
"A2","20",0,0,0
"A1","26",1,0,1
The easiest way to handle the "Group":null
cases is probably to add the following line immediately before | .Group[]
:
| .Group |= (. // [{Id:"0", Units:[]}])
That way you can also easily change the "default" value of "Id".
Upvotes: 2
Reputation: 116730
This is for the case where the number of "Unit" columns (n) is known beforehand. It is just a variant of @jq170717's implementation.
The use of max
is to ensure reasonable behavior if the given value of n
is too small. In that case, the number of columns in the output will vary.
The following has been tested with jq versions 1.5 and master; see below for the tweaks necessary for earlier versions of jq.
Invocation: jq -nr -f tocsv.jq data.json
tocsv.jq:
# n is the number of desired "Unit" columns
def tocsv(n):
def h: ["Action", "Group", "Unit\(range(1;n+1))"];
def i(n): reduce .[] as $i ([range(0;n)|"0"]; .[$i]="1");
def p:
inputs
| .Action as $a
| .Group[]
| [$a, .Id] + (.Units | map(tonumber-1) | i(n));
h,p | join(",") ;
tocsv(5)
The above has been written in a way that you can simply replace the call to join
by a call to @csv
or @tsv
if you want all their benefits. In that case, though, you might want to use 0
and 1
rather than "0"
and "1"
in the indicator function, i
.
$ jq -nr -f tocsv.jq data.json
Action,Group,Unit1,Unit2,Unit3
A1,10,1,0,0
A2,11,0,1,0
A2,20,0,0,0
A1,26,1,0,1
For jq 1.3 or 1.4, change inputs
to .[]
, and use the following incantation:
jq -r -s -f tocsv.jq data.json
The easiest way to handle the "Group":null
cases is probably to add the following line immediately before | .Group[]
:
| .Group |= (. // [{Id:"0", Units:[]}])
That way you can also easily change the "default" value of "Id".
Upvotes: 2