macarthy
macarthy

Reputation: 3069

Creating a CSV from json using jq, based on elements in array

I have the following json format that I need to convert to CSV

[{
            "name": "joe",
            "age": 21,
            "skills": [{
                "lang": "spanish",
                "grade": "47",
                "school": {
                    "name": "my school",
                    "url": "example.com/sp-school"
                }
            }, {
                "lang": "english",
                "grade": "87"
            }]

        },

        {
            "name": "sarah",
            "age": 34,
            "skills": [{
                "lang": "french",
                "grade": "47",
                "school": {
                    "name": "my school",
                    "url": "example.com/sp-school"
                }
            }, {
                "lang": "english",
                "grade": "87"
            }]

        }, {
            "name": "jim",
            "age": 26,
            "skills": [{
                "lang": "spanish",
                "grade": "60"

            }, {
                "lang": "english",
                "grade": "66",
                "school": {
                    "name": "eg school",
                    "url": "eg-school.com"

                }
            }]

        }
    ]

to convert to csv

name,age,grade,school,url,file,line_number
joe,21,47,"my school","example.com/sp-school",sample.json,1
jim,26,60,"","",sample.json,3

So add the top level fields and the object from the skills array if lang=spanish and the school hash from the skills object for spanish if it exists

I'd also like to add the file and line number it came from.

I would like to use jq for the job, but can't figure out the syntax , anyone help me out ?

Upvotes: 1

Views: 740

Answers (1)

peak
peak

Reputation: 116780

With your data in input.json, and the following jq program in tocsv.jq:

.[]
| [.name, .age] +
  (.skills[]
  | select(.lang == "spanish")
  | [.grade, .school.name, .school.url, input_filename, input_line_number] )
| @csv

the invocation:

jq -r -f tocsv.jq  input.json

yields:

"joe",21,"47","my school","example.com/sp-school","input.json",51
"jim",26,"60",,,"input.json",51

If you want the number-valued strings converted to numbers, you could use the "tonumber" filter. If you want the null-valued fields replaced by strings, use e.g. .school.name // ""

Of course this approach doesn't yield a very useful line number. One approach that would yield higher granularity would be to stream the individual objects into jq, but then you'd lose the filename. To recover the filename you could pass it in as an argument. So you would have a pipeline like so:

jq -c '.[]' input.json | jq -r --arg file input.json -f tocsv2.jq

where tocsv2.jq would be like tscsv.jq above but without the initial .[] |, and with $file instead of input_filename.

Finally, please also consider using the TSV format (@tsv) rather than the rather messy CSV format (@csv).

Upvotes: 2

Related Questions