user13101903
user13101903

Reputation: 23

Why does my jq / read / echo pipeline remove backslashes?

I am trying to split a large JSON file (~4 Mio elements) into separate files (one file per element).

The file kinda looks like this:

{
  "books": [
    {
      "title": "Professional JavaScript - \"The best guide\"",
      "authors": [
        "Nicholas C. Zakas"
      ],
      "edition": 3,
      "year": 2011
    },
    {
      "title": "Professional JavaScript",
      "authors": [
        "Nicholas C.Zakas"
      ],
      "edition": 2,
      "year": 2009
    },
    {
      "title": "Professional Ajax",
      "authors": [
        "Nicholas C. Zakas",
        "Jeremy McPeak",
        "Joe Fawcett"
      ],
      "edition": 2,
      "year": 2008
    }
  ]
}

To split each book into a separate file, I am using the following command:

cat books.json | jq -c -M '.books[]' | while read line; do echo $line > temp/$(date +%s%N).json; done

For the last two items, everything's ok, because the book title does not contain any quotes. However, in the first one, the \" get replaced by " which leads to a broken JSON file, as the subsequent parser - of course - interprets the " as a boundary of an element.

I've tried to use jq -r, but that did not help.

I'm using the jq version shipped by CentOS 7:

[root@machine]$ jq --version
jq-1.6

Any suggestions?

Upvotes: 2

Views: 718

Answers (1)

Benjamin W.
Benjamin W.

Reputation: 52112

You have to use the -r option to read:

while read -r line; do echo "$line" > temp/"$(date +%s%N)".json; done

It prevents interpreting backslash escapes.

And you should quote your variables.

See the difference:

$ read var <<< 'quoted quotes: \"\"'
$ echo "$var"
quoted quotes: ""
$ read -r var <<< 'quoted quotes: \"\"'
$ echo "$var"
quoted quotes: \"\"

Using -r with read is almost always what you want and really should have been the default behaviour.

Upvotes: 2

Related Questions