jpl1079
jpl1079

Reputation: 591

CSV to JSON using jq

If you have a csv dataset like this:

name, age, gender
john, 20, male
jane, 30, female
bob, 25, male

Can you get to this:

[ {"name": "john", "age": 20, "gender": "male"},
  {"name": "jane", "age": 30, "gender": "female"},
  {"name": "bob", "age": 25, "gender": "male"} ]

using only jq?

I found this article which shows what I'm trying to do, but it uses a 'manual' mapping of the header fields to the values. I don't need/want to rename the header fields and have quite a few of them. I would also not want to have to change a script/command every time the layout changes.

Is it possible to dynamically extract the headers and then combine them with the values with a jq one-liner?

Upvotes: 49

Views: 52527

Answers (10)

Mike Farah
Mike Farah

Reputation: 2584

yq (disclaimer I wrote it) supports this out of the box:

yq file.csv -p=csv -o=json

yields:

[
  {
    "name": "john",
    " age": 20,
    " gender": "male"
  },
  {
    "name": "jane",
    " age": 30,
    " gender": "female"
  },
  {
    "name": "bob",
    " age": 25,
    " gender": "male"
  }
]

The original CSV has leading spaces on columns 2 and 3 - not sure if that's a mistake or not. You can trim them by add an expression:

yq '(... | select(tag == "!!str")) |= trim'  file.csv -p=csv -o=json

This will match all strings and trim leading spaces, yielding:

[
  {
    "name": "john",
    "age": 20,
    "gender": "male"
  },
  {
    "name": "jane",
    "age": 30,
    "gender": "female"
  },
  {
    "name": "bob",
    "age": 25,
    "gender": "male"
  }
]

Upvotes: 31

jq170727
jq170727

Reputation: 14655

Here is a solution that assumes you run jq with -s and -R options.

[
  [                                               
    split("\n")[]                  # transform csv input into array
  | split(", ")                    # where first element has key names
  | select(length==3)              # and other elements have values
  ]                                
  | {h:.[0], v:.[1:][]}            # {h:[keys], v:[values]}
  | [.h, (.v|map(tonumber?//.))]   # [ [keys], [values] ]
  | [ transpose[]                  # [ [key,value], [key,value], ... ]
      | {key:.[0], value:.[1]}     # [ {"key":key, "value":value}, ... ]
    ]
  | from_entries                   # { key:value, key:value, ... }
]

Sample run:

jq -s -R -f filter.jq data.csv

Sample output

[
  {
    "name": "john",
    "age": 20,
    "gender": "male"
  },
  {
    "name": "jane",
    "age": 30,
    "gender": "female"
  },
  {
    "name": "bob",
    "age": 25,
    "gender": "male"
  }
]

Upvotes: 4

DarkStar
DarkStar

Reputation: 595

Here is a fairly simple "one-liner" version for jq that will work for "reasonably" sized files, for very large files you would need a version that doesn't use slurp. I'm fairly new to jq and I'm sure there are even better ways to do this (maybe just incr an index value instead of storing in the data). You can replace "split" with ./"\n" and ./"," if you want to make it even shorter and harder to read. NOTE: if you really need the space after the comma can split on ", " or add |map(gsub("^\s+|\s+$";"")) after the split on comma to trim leading & trailing white space.

jq -Rs 'split("\n")|map(split(",")|to_entries)|.[0] as $header|.[1:]|map(reduce .[] as $item ({};.[$header[$item.key].value]=$item.value))'

Here is a commented version:

# jq -Rs
split("\n") | map( split(",") | to_entries ) # split lines, split comma & number
  | .[0] as $header # save [0]
  | .[1:] # and then drop it
  | map( reduce .[] as $item ( {}; .[$header[$item.key].value] = $item.value ) )

The top portion is pretty straight-forward: split the data on newline, then for each of those elements split on comma and then to_entries will turn each of those into key/value entries with a numbering of the keys (0..N): {key:#, value:string}

Then it uses map/reduce to take each element and replace it with an object of key/value pairs using the numbered key to index back into the headers to get the label. For those new to reduce (like me) the first element up to the semi-colon is to initialize the 'accumulator' (the thing you modify each pass over the elements) so .[...] is modifying the accumulator and $item is the object we're operating on.

Update: I got a better version working now that doesn't use slurp, and we do NOT use -n option because it will treat first line specially:

jq -R 'split(",") as $h|reduce inputs as $in ([]; . += [$in|split(",")|. as $a|reduce range(0,length) as $i ({};.[$h[$i]]=$a[$i])])'

Upvotes: 6

aborruso
aborruso

Reputation: 5688

with Miller (http://johnkerl.org/miller/doc/) is very simple. Using this input.csv file

name,age,gender
john,20,male
jane,30,female
bob,25,male

and running

mlr --c2j --jlistwrap cat input.csv

You will have

[
{ "name": "john", "age": 20, "gender": "male" }
,{ "name": "jane", "age": 30, "gender": "female" }
,{ "name": "bob", "age": 25, "gender": "male" }
]

EDIT

This is an old question: the new doc page is https://miller.readthedocs.io/en/latest/

Upvotes: 41

rjurney
rjurney

Reputation: 5150

This is a one-liner using NPM library any-json:

# CSV file needs header row
any-json convert data.csv data.json

You can install it with:

npm install -g any-json

If you need node.js, download it.

Upvotes: 0

rickhg12hs
rickhg12hs

Reputation: 11912

Having done something similar recently, here's another jq one-liner to transform the CSV into a JSON array.

jq --null-input --raw-input '[input|scan("\\w+")] as $header |[inputs as $data |[$header,[$data|scan("\\w+")|tonumber? // .]] |transpose |map({(.[0]):.[1]}) |add]' input.csv

Output, given the example input:

[
  {
    "name": "john",
    "age": 20,
    "gender": "male"
  },
  {
    "name": "jane",
    "age": 30,
    "gender": "female"
  },
  {
    "name": "bob",
    "age": 25,
    "gender": "male"
  }
]

Try it on jqplay.org.

Upvotes: 5

JanB
JanB

Reputation: 75

It can also be done without reduce syntax:

#! /bin/jq -fRs

split("\n")|map(select(.!="")|split(","))
|.[0] as $headers
|.[1:][]
|with_entries(.key=$headers[.key])

Upvotes: 1

peak
peak

Reputation: 116730

In short - yes, except maybe for the one-liner bit.

jq is often well-suited to text wrangling, and this is especially true of versions with regex support. With regex support, for example, the trimming required by the given problem statement is trivial.

Since jq 1.5rc1 includes regex support and has been available since Jan 1, 2015, the following program assumes a version of jq 1.5; if you wish to make it work with jq 1.4, then see the two "For jq 1.4" comments.

Please also note that this program does not handle CSV in all its generality and complexity. (For a similar approach that does handle CSV more generally, see https://github.com/stedolan/jq/wiki/Cookbook#convert-a-csv-file-with-headers-to-json)

# objectify/1 takes an array of string values as inputs, converts
# numeric values to numbers, and packages the results into an object
# with keys specified by the "headers" array
def objectify(headers):
  # For jq 1.4, replace the following line by: def tonumberq: .;
  def tonumberq: tonumber? // .;
  . as $in
  | reduce range(0; headers|length) as $i ({}; .[headers[$i]] = ($in[$i] | tonumberq) );

def csv2table:
  # For jq 1.4, replace the following line by:  def trim: .;
  def trim: sub("^ +";"") |  sub(" +$";"");
  split("\n") | map( split(",") | map(trim) );

def csv2json:
  csv2table
  | .[0] as $headers
  | reduce (.[1:][] | select(length > 0) ) as $row
      ( []; . + [ $row|objectify($headers) ]);

csv2json

Example (assuming csv.csv is the given CSV text file):

$ jq -R -s -f csv2json.jq csv.csv
[
  {
    "name": "john",
    "age": 20,
    "gender": "male"
  },
  {
    "name": "jane",
    "age": 30,
    "gender": "female"
  },
  {
    "name": "bob",
    "age": 25,
    "gender": "male"
  }
]

Upvotes: 44

Michel Hua
Michel Hua

Reputation: 1777

As of 2018, a modern no code solution would be to use Python tool csvkit has csvjson data.csv > data.json.

See their documentation https://csvkit.readthedocs.io/en/1.0.2/

The toolkit is also very handy and complementary to jq if your script has to debug both csv and json formats.

You might also want to check a powerful tool called visidata. Here is a screencast case study that is similar to the original poster's. You can also generate script from visidata

Upvotes: 19

Tom
Tom

Reputation: 1793

I had a little play and came up with this. But it may not be the best way, and I'd be interested to see what your attempts were like, because after all if we both came at a solution I'm sure it'd be twice as good!

But I would start from something like:

true as $doHeaders
| . / "\n"
| map(. / ", ")
| (if $doHeaders then .[0] else [range(0; (.[0] | length)) | tostring] end) as $headers
| .[if $doHeaders then 1 else 0 end:][]
| . as $values
| keys
| map({($headers[.]): $values[.]})

Working Example

The variable $doHeaders controls whether to read the top line as a header line. In your case you want it as true, but I added it for future SO users and because, well, I had an excellent breakfast today and the weather is lovely, so why not?

Little explanation:

1) . / "\n" Split by line...

2) map(. / ", ") ... and comma (Big gotcha: In your version, you'll want to use a regex based split because like this you'll split on commas inside quotation marks too. I just used this because it's terse, and that makes my solution look cool right?)

3) if $doHeaders then... Here we create an array of strings keys or numbers depending on the number of elements in the first row and whether the first row is a header row

4) .[if $doHeaders then 1 else 0 end:] Ok, so trim off the top line if it's a header

5) map({($headers[.]): $values[.]}) Above we go over each row in the former csv, and put the $values into a variable and the keys into a pipe. Then we construct your desired object.

Of course you'll want to use a few regexes to fill in the gotchas, but I hope that starts you on the way.

Upvotes: 10

Related Questions