Reputation: 23
I am trying to split a large JSON file (~4 Mio elements) into separate files (one file per element).
The file kinda looks like this:
{
"books": [
{
"title": "Professional JavaScript - \"The best guide\"",
"authors": [
"Nicholas C. Zakas"
],
"edition": 3,
"year": 2011
},
{
"title": "Professional JavaScript",
"authors": [
"Nicholas C.Zakas"
],
"edition": 2,
"year": 2009
},
{
"title": "Professional Ajax",
"authors": [
"Nicholas C. Zakas",
"Jeremy McPeak",
"Joe Fawcett"
],
"edition": 2,
"year": 2008
}
]
}
To split each book into a separate file, I am using the following command:
cat books.json | jq -c -M '.books[]' | while read line; do echo $line > temp/$(date +%s%N).json; done
For the last two items, everything's ok, because the book title does not contain any quotes. However, in the first one, the \"
get replaced by "
which leads to a broken JSON file, as the subsequent parser - of course - interprets the "
as a boundary of an element.
I've tried to use jq -r
, but that did not help.
I'm using the jq version shipped by CentOS 7:
[root@machine]$ jq --version
jq-1.6
Any suggestions?
Upvotes: 2
Views: 718
Reputation: 52112
You have to use the -r
option to read
:
while read -r line; do echo "$line" > temp/"$(date +%s%N)".json; done
It prevents interpreting backslash escapes.
And you should quote your variables.
See the difference:
$ read var <<< 'quoted quotes: \"\"'
$ echo "$var"
quoted quotes: ""
$ read -r var <<< 'quoted quotes: \"\"'
$ echo "$var"
quoted quotes: \"\"
Using -r
with read
is almost always what you want and really should have been the default behaviour.
Upvotes: 2