Reputation: 373
I have files which have N JSON objects and they are separated by comma (,)
{"a":1},{"b":2},{"c":3},{"d":2},{"e":1},{"f":2} ...
I would like to convert them into one JSON array with N objects using jq
[{"a":1},{"b":2},{"c":3},{"d":2},{"e":1},{"f":2} ...]
I tried jq -R 'split(",")' myfile.json
but it gives me an array of N strings
[
"{\"a\":1}",
"{\"b\":2}",
"{\"a\":1}",
"{\"b\":2}",
"{\"a\":1}",
"{\"b\":2}",
"{\"a\":1}",
"{\"b\":2}" ....
]
Any idea?
Upvotes: 2
Views: 6996
Reputation: 116690
Since you have millions of these JSON objects, let me first suggest an efficient way to produce a stream of them in the JSON-Lines format (i.e., with "newline" as the delimiter).
WARNING: THE FOLLOWING ASSUMES THAT THE OBJECTS DO NOT CONTAIN JSON STRINGS WITH COMMAS.
Let's assume the comma-separated objects are in a file named objects.txt. First, create a file, program.jq, with the following jq program:
def one:
(try input catch null)
| if . == 0 then empty elif . == null then one else (., one) end;
one
Then assuming your shell allows it, the invocation:
(cat objects.txt; echo 0) |
sed $'s/,/,\\\n/g' |
jq -n -c -f program.jq objects.txt
will produce the stream, one JSON object per line. This is a very manageable format. For example, to produce an array, you could pipe the above-mentioned stream into jq -s .
However, if the goal is solely to produce a JSON array, then as pointed out elsewhere, the most efficient approach would be to enclose the comma-separated objects in square brackets, along the lines of:
(echo "["; cat objects.txt; echo "]")
So the relevant question here, perhaps, is: what's the real goal? It seems doubtful that having an unmanageably large array of small JSON objects is likely to more useful than either the original comma-separated sequence, or a simple stream.
Upvotes: 1
Reputation: 47099
You are on the right track, you just need to map fromjson
to the array, e.g.:
jq -Rc 'split(",") | map(fromjson)' myfile.json
Output:
[{"a":1},{"b":2},{"c":3},{"d":2},{"e":1},{"f":2}]
However, if you are dealing with huge inputs, perhaps use a more streamable command to split the input into chunks, e.g. with tr
:
<myfile.json tr ',' '\n' | jq -c .
Output:
{"a":1}
{"b":2}
{"c":3}
{"d":2}
{"e":1}
{"f":2}
Upvotes: 1