com1999
com1999

Reputation: 23

jq create output in many separate files

given the following json:

    [
    {"_id":{"$oid":"6d2"},"jlo":"ΕΙ AJSB","dd":"d5f"},
    {"_id":{"$oid":"c6d3"},"jlo":"ΕΙ ALKSB","dd":"5d9"},
    {"_id":{"$oid":"b0cc6d4"},"jlo":"ΕΙ AGHTSB","dd":"1b1"},
    {"_id":{"$oid":"6d2"},"jlo":"ΕPOWΙ AJSB","dd":"d5f"},
    {"_id":{"$oid":"c6d3"},"jlo":"ΕGTΙ ALKSB","dd":"5d9"},
    {"_id":{"$oid":"b0cc6d4"},"jlo":"ΕLKΙ AGHTSB","dd":"1b1"}
    ]

what i need to do is have as output for each discrete value of the ll element, the unique values of ta, in a separate file, named after a one to one representation where each dd code is substituted with a human readable representation:

d5f:departmentone
5d9:departmentalt
1b1:departshort

Desired output, in a per row basis, each unique value of jlo with the count of times it was found in each dd element so we get in the end something like this:

first file named departmentone.txt:
ΕΙ AJSB 1
ΕPOWΙ AJSB 1

second file named departmentalt.txt
ΕΙ ALKSB 1
ΕGTΙ ALKSB 1

third file named departshort.txt
ΕΙ AGHTSB 2

i have tried with map and reduce, group_by, sort_by, with really poor results

Upvotes: 2

Views: 1454

Answers (2)

peak
peak

Reputation: 116740

Only one invocation of jq is necessary. To allocate the output to the separate files, you can combine this one invocation with a single invocation to awk, or you could use a shell loop as illustrated below.

First, here's an illustration of how the shell pipeline would look:

jq -r --rawfile dd2name dd2name.tsv -f group.jq input.json |
  while IFS=$'\t' read -r f v ; do echo "$v" >> "$f" ; done

This assumes that the mapping to filenames is in a TSV file named dd2name.tsv, and that the following jq program is in group.jq:

def dict:
  split("\n") | map(select(length>0) | split("\t"))
  | INDEX(.[0]) | map_values(.[1]);

($dd2name | dict) as $dict
| ($dict | keys_unsorted[]) as $dd
| map(select(.dd == $dd))
| group_by(.jlo)
| map("\($dict[$dd])\t\(.[0].jlo) \(length)")[]

As the name suggests, the dict function creates a dictionary giving the mapping of .dd values to the filenames. It assumes the availability of INDEX. If your jq does not have INDEX, then now would be an excellent time to upgrade your jq; otherwise, its def can easily be copied from builtin.jq (google: builtin.jq "def INDEX"), or you could replace the last line by: | reduce .[] as $p ({}; .[$p[0]] = $p[1]);

awk-based solution

The following invocation of awk can be used instead of the while ... done command above:

awk -F\\t 'fn && (fn!=$1) {close(fn)}; {fn=$1; print $2 >> fn}'

Season to taste

If the dd2name.tsv mapping file does not contain the ".txt" suffix, it can easily be added in any of a variety of ways, according to taste.

Note also that the proposed solutions above make some assumptions, notably that the .jlo values do not contain tabs, newlines, or NULs. If any of those assumptions is violated, then some tweaking will be required.

Upvotes: 3

Aaron
Aaron

Reputation: 24812

I'd do it in three passes, filtering the array with the desired dd and grouping by jlo, then extracting the jlo of the first (guaranteed) item of the array and its length :

map(select(.dd == "d5f")) | group_by(.jlo) | map("\(.[0].jlo) \(length)") | .[]

You can try it here.

Full bash run :

jq --arg dd d5f --raw-output 'map(select(.dd == $dd)) | group_by(.jlo) | map("\(.[0].jlo) \(length)") | .[]' yourJsonFile > departmentone.txt
jq --arg dd 5d9 --raw-output 'map(select(.dd == $dd)) | group_by(.jlo) | map("\(.[0].jlo) \(length)") | .[]' yourJsonFile > departmentalt.txt
jq --arg dd 1b1 --raw-output 'map(select(.dd == $dd)) | group_by(.jlo) | map("\(.[0].jlo) \(length)") | .[]' yourJsonFile > departmentshort.txt

Supposing you have a file named "mapping.txt" with the following content :

d5f:departmentone
5d9:departmentalt
1b1:departshort

You could extract those codes and labels to generate the files :

while IFS=: read -r code label; do
    jq --arg dd $code --raw-output 'map(select(.dd == $dd)) | group_by(.jlo) | map("\(.[0].jlo) \(length)") | .[]' yourJsonFile > "$label".txt
done < mapping.txt

Upvotes: 0

Related Questions