Reputation: 110093
I was trying to load data from Google's json export, but it looks like it's not valid JSON (ECMA-404),(RFC 7159),(RFC 4627). Here is what I'm expecting for json newline:
[{},{},{}]
But here is what it's giving:
{}{}{}
Here's an example output from clicking the "Download as JSON" button on a four-row query result:
{"c0":"001U0000016lf5jIAA","c1":"Tim Burton's Corpse Bride","c2":"a0KU000000OkQ8IMAV","c3":"Luxembourg","c4":"German","c5":"Sub & Audio","c21":null,"c22":"2025542.0"}
{"c0":"001U0000016lf5jIAA","c1":"Tim Burton's Corpse Bride","c2":"a0KU000000OkQ8IMAV","c3":"Luxembourg","c4":"German","c5":"Sub & Audio","c21":null,"c22":"2025542.0"}
{"c0":"001U0000016lf5jIAA","c1":"Tim Burton's Corpse Bride","c2":"a0KU000000OjUuEMAV","c3":"Luxembourg","c4":"French - Parisian","c5":"Sub & Audio","c21":null,"c22":"2025542.0"}
{"c0":"001U0000016lf5jIAA","c1":"Tim Burton's Corpse Bride","c2":"a0KU000000OkQ8IMAV","c3":"Luxembourg","c4":"German","c5":"Sub & Audio","c21":null,"c22":"2025542.0"}
Is there a reason why BigQuery is using this export format for json? Are there other Google services or something that are dependent on this format, or why would it be pushing a non-standard json format? (Maybe I'm just misunderstanding json line format). Note, this is from the web-UI, not the API, which gives valid json.
Upvotes: 5
Views: 1213
Reputation: 59165
BigQuery reads and outputs new-line delimited JSON - this because traditional JSON doesn't adapt well to the needs of big data.
See:
The output of "Download as JSON" shown in the question is compatible with the JSON input that BigQuery can read.
Note that the web UI also offers to look at the results of a query as JSON - and those results are formatted as a traditional JSON object. I'm not sure what was the design decision to have this incompatible output here - but results in that form won't be able to be imported back to BigQuery.
So in general, this format is incompatible with BigQuery:
While this is compatible with BigQuery:
Why is this less traditional JSON format the best choice in the big data world? Encapsulating a trillion rows within [...]
defines a single object with a trillion rows - which is hard to parse and handle. New line delimited JSON solves this problem, with each row being an independent object.
Upvotes: 5