Reputation: 591
If you have a csv dataset like this:
name, age, gender
john, 20, male
jane, 30, female
bob, 25, male
Can you get to this:
[ {"name": "john", "age": 20, "gender": "male"},
{"name": "jane", "age": 30, "gender": "female"},
{"name": "bob", "age": 25, "gender": "male"} ]
using only jq?
I found this article which shows what I'm trying to do, but it uses a 'manual' mapping of the header fields to the values. I don't need/want to rename the header fields and have quite a few of them. I would also not want to have to change a script/command every time the layout changes.
Is it possible to dynamically extract the headers and then combine them with the values with a jq one-liner?
Upvotes: 49
Views: 52527
Reputation: 2584
yq (disclaimer I wrote it) supports this out of the box:
yq file.csv -p=csv -o=json
yields:
[
{
"name": "john",
" age": 20,
" gender": "male"
},
{
"name": "jane",
" age": 30,
" gender": "female"
},
{
"name": "bob",
" age": 25,
" gender": "male"
}
]
The original CSV has leading spaces on columns 2 and 3 - not sure if that's a mistake or not. You can trim them by add an expression:
yq '(... | select(tag == "!!str")) |= trim' file.csv -p=csv -o=json
This will match all strings and trim leading spaces, yielding:
[
{
"name": "john",
"age": 20,
"gender": "male"
},
{
"name": "jane",
"age": 30,
"gender": "female"
},
{
"name": "bob",
"age": 25,
"gender": "male"
}
]
Upvotes: 31
Reputation: 14655
Here is a solution that assumes you run jq with -s
and -R
options.
[
[
split("\n")[] # transform csv input into array
| split(", ") # where first element has key names
| select(length==3) # and other elements have values
]
| {h:.[0], v:.[1:][]} # {h:[keys], v:[values]}
| [.h, (.v|map(tonumber?//.))] # [ [keys], [values] ]
| [ transpose[] # [ [key,value], [key,value], ... ]
| {key:.[0], value:.[1]} # [ {"key":key, "value":value}, ... ]
]
| from_entries # { key:value, key:value, ... }
]
Sample run:
jq -s -R -f filter.jq data.csv
Sample output
[
{
"name": "john",
"age": 20,
"gender": "male"
},
{
"name": "jane",
"age": 30,
"gender": "female"
},
{
"name": "bob",
"age": 25,
"gender": "male"
}
]
Upvotes: 4
Reputation: 595
Here is a fairly simple "one-liner" version for jq that will work for "reasonably" sized files, for very large files you would need a version that doesn't use slurp. I'm fairly new to jq and I'm sure there are even better ways to do this (maybe just incr an index value instead of storing in the data). You can replace "split" with ./"\n" and ./"," if you want to make it even shorter and harder to read. NOTE: if you really need the space after the comma can split on ", " or add |map(gsub("^\s+|\s+$";"")) after the split on comma to trim leading & trailing white space.
jq -Rs 'split("\n")|map(split(",")|to_entries)|.[0] as $header|.[1:]|map(reduce .[] as $item ({};.[$header[$item.key].value]=$item.value))'
Here is a commented version:
# jq -Rs
split("\n") | map( split(",") | to_entries ) # split lines, split comma & number
| .[0] as $header # save [0]
| .[1:] # and then drop it
| map( reduce .[] as $item ( {}; .[$header[$item.key].value] = $item.value ) )
The top portion is pretty straight-forward: split the data on newline, then for each of those elements split on comma and then to_entries will turn each of those into key/value entries with a numbering of the keys (0..N): {key:#, value:string}
Then it uses map/reduce to take each element and replace it with an object of key/value pairs using the numbered key to index back into the headers to get the label. For those new to reduce (like me) the first element up to the semi-colon is to initialize the 'accumulator' (the thing you modify each pass over the elements) so .[...] is modifying the accumulator and $item is the object we're operating on.
Update: I got a better version working now that doesn't use slurp, and we do NOT use -n option because it will treat first line specially:
jq -R 'split(",") as $h|reduce inputs as $in ([]; . += [$in|split(",")|. as $a|reduce range(0,length) as $i ({};.[$h[$i]]=$a[$i])])'
Upvotes: 6
Reputation: 5688
with Miller (http://johnkerl.org/miller/doc/) is very simple. Using this input.csv file
name,age,gender
john,20,male
jane,30,female
bob,25,male
and running
mlr --c2j --jlistwrap cat input.csv
You will have
[
{ "name": "john", "age": 20, "gender": "male" }
,{ "name": "jane", "age": 30, "gender": "female" }
,{ "name": "bob", "age": 25, "gender": "male" }
]
EDIT
This is an old question: the new doc page is https://miller.readthedocs.io/en/latest/
Upvotes: 41
Reputation: 5150
This is a one-liner using NPM library any-json:
# CSV file needs header row
any-json convert data.csv data.json
You can install it with:
npm install -g any-json
If you need node.js, download it.
Upvotes: 0
Reputation: 11912
Having done something similar recently, here's another jq
one-liner to transform the CSV into a JSON array.
jq --null-input --raw-input '[input|scan("\\w+")] as $header |[inputs as $data |[$header,[$data|scan("\\w+")|tonumber? // .]] |transpose |map({(.[0]):.[1]}) |add]' input.csv
Output, given the example input:
[
{
"name": "john",
"age": 20,
"gender": "male"
},
{
"name": "jane",
"age": 30,
"gender": "female"
},
{
"name": "bob",
"age": 25,
"gender": "male"
}
]
Try it on jqplay.org.
Upvotes: 5
Reputation: 75
It can also be done without reduce syntax:
#! /bin/jq -fRs
split("\n")|map(select(.!="")|split(","))
|.[0] as $headers
|.[1:][]
|with_entries(.key=$headers[.key])
Upvotes: 1
Reputation: 116730
In short - yes, except maybe for the one-liner bit.
jq is often well-suited to text wrangling, and this is especially true of versions with regex support. With regex support, for example, the trimming required by the given problem statement is trivial.
Since jq 1.5rc1 includes regex support and has been available since Jan 1, 2015, the following program assumes a version of jq 1.5; if you wish to make it work with jq 1.4, then see the two "For jq 1.4" comments.
Please also note that this program does not handle CSV in all its generality and complexity. (For a similar approach that does handle CSV more generally, see https://github.com/stedolan/jq/wiki/Cookbook#convert-a-csv-file-with-headers-to-json)
# objectify/1 takes an array of string values as inputs, converts
# numeric values to numbers, and packages the results into an object
# with keys specified by the "headers" array
def objectify(headers):
# For jq 1.4, replace the following line by: def tonumberq: .;
def tonumberq: tonumber? // .;
. as $in
| reduce range(0; headers|length) as $i ({}; .[headers[$i]] = ($in[$i] | tonumberq) );
def csv2table:
# For jq 1.4, replace the following line by: def trim: .;
def trim: sub("^ +";"") | sub(" +$";"");
split("\n") | map( split(",") | map(trim) );
def csv2json:
csv2table
| .[0] as $headers
| reduce (.[1:][] | select(length > 0) ) as $row
( []; . + [ $row|objectify($headers) ]);
csv2json
Example (assuming csv.csv is the given CSV text file):
$ jq -R -s -f csv2json.jq csv.csv
[
{
"name": "john",
"age": 20,
"gender": "male"
},
{
"name": "jane",
"age": 30,
"gender": "female"
},
{
"name": "bob",
"age": 25,
"gender": "male"
}
]
Upvotes: 44
Reputation: 1777
As of 2018, a modern no code solution would be to use Python tool csvkit
has csvjson data.csv > data.json
.
See their documentation https://csvkit.readthedocs.io/en/1.0.2/
The toolkit is also very handy and complementary to jq
if your script has to debug both csv
and json
formats.
You might also want to check a powerful tool called visidata. Here is a screencast case study that is similar to the original poster's. You can also generate script from visidata
Upvotes: 19
Reputation: 1793
I had a little play and came up with this. But it may not be the best way, and I'd be interested to see what your attempts were like, because after all if we both came at a solution I'm sure it'd be twice as good!
But I would start from something like:
true as $doHeaders
| . / "\n"
| map(. / ", ")
| (if $doHeaders then .[0] else [range(0; (.[0] | length)) | tostring] end) as $headers
| .[if $doHeaders then 1 else 0 end:][]
| . as $values
| keys
| map({($headers[.]): $values[.]})
The variable $doHeaders
controls whether to read the top line as a header line. In your case you want it as true, but I added it for future SO users and because, well, I had an excellent breakfast today and the weather is lovely, so why not?
Little explanation:
1) . / "\n"
Split by line...
2) map(. / ", ")
... and comma (Big gotcha: In your version, you'll want to use a regex based split because like this you'll split on commas inside quotation marks too. I just used this because it's terse, and that makes my solution look cool right?)
3) if $doHeaders then...
Here we create an array of strings keys or numbers depending on the number of elements in the first row and whether the first row is a header row
4) .[if $doHeaders then 1 else 0 end:]
Ok, so trim off the top line if it's a header
5) map({($headers[.]): $values[.]})
Above we go over each row in the former csv, and put the $values
into a variable and the keys into a pipe. Then we construct your desired object.
Of course you'll want to use a few regexes to fill in the gotchas, but I hope that starts you on the way.
Upvotes: 10