Reputation: 1
I have a BQ table that contains potentially very large JSON objects. I require a only one property (key:value pair) from each JSON object, so it is my preference to first select the JSON property directly in my pipeline and avoid reading in the whole JSON object. However, upon inspection i find that the many of the keys contain escaped quotes (e.g. "content": "{"Performance": null, "RatabaseXML": null}). The other fields don't have these escaped quotes, only the content field's keys do.
The BQ methods JSON_EXTRACT and JSON_EXTRACT_SCALAR don't know how to handle the \" characters.
Is there a way to select this JSON attribute and extract the property I care about when the Keys contain escaped quotes?
this is not the same as this post which focused on escaped quotes elsewhere. This JSON does come back as valid from https://jsonformatter.curiousconcept.com/#
Every attempt I try right now oscillates around using replaces (example below). However ideally I do NOT want to do this because this will read in the entire payload object as a string and then do the replace on it, where-as I want to by-pass this for performance reasons and just query the property i care about.
WITH json_sample AS (
SELECT
'{"applicationName": "familycarservice", "content": "{\"Performance\": "cats", \"RatabaseXML\": null}' AS payload
)
SELECT
JSON_EXTRACT_SCALAR(
REPLACE(REPLACE(payload, '\\"', '"'), '"{', '{'),
'$.content.Performance'
) AS Performance
from json_sample;
Upvotes: 0
Views: 558
Reputation: 1713
I don't understand, why are you removing the quotes from JSON? That will invalidate it. I think you are worrying about a problem that does not exist. There is no such thing as a quoted string in BQ. SQL in general uses apostrophes for string ('
). That means that quotes ("
) are perfectly valid to exist and are characters, they do not denote a string, but are part of a string.
So your assumption:
However, upon inspection i find that the many of the keys contain escaped quotes (e.g. "content": "{"Performance": null, "RatabaseXML": null}).
... is not valid in a database (well, I can't guarantee for all of them, I'll say almost all) since JSON and XML characters are part of a string. (so yes, XML is treated the same way).
BQ not supporting \"
is perfectly normal since there is no need to escape a legal character. And of course https://jsonformatter.curiousconcept.com/# will say it's valid because that is what a valid JSON looks like. By removing all quotes you are invalidating it and potentially making it impossible for native functions to even parse it.
(e.g. "content": "{"Performance": null, "RatabaseXML": null})
This is valid JSON so don't touch it.
Furthermore, consider that in 3 years, you'll use a new system that might need this JSON. Now that you "cleaned them" (and therefore invalidated them making them unusable), that future possible app won't be able to use this JSON.
The other fields don't have these escaped quotes, only the content field's keys do.
Well, let's call them "quoted strings", since they're not unescaped, and yes, in JSON strings (and keys) are quoted, but numbers and booleans are not.
As for "that" post, it's a totally different story. That's for JS where strings are defined by quotes and escaping is needed there.
If you check the docs for any of the JSON_ functions (I randomly chose EXTRACT) you'll see they confirm what I said and every example uses quotes because that's how it's supposed to be done. Any member needed by Big Query will obviously be a string and thus used with apostrophes.
The example in the docs confirms everything I said:
SELECT
JSON_EXTRACT(JSON '{"class": {"students": [{"id": 5}, {"id": 12}]}}', '$.class')
AS json_data;
So for short: leave the JSON as it is and use the BigQuery JSON_* functions normally. Whoever told you to "clean" that JSON payload is wrong. And if you still have doubts about what I said or the docs, just try it and tell me if it doesn't work. If it doesn't, then there may either be a problem with the app that's sending you the JSON, either with the way it is received or processed in the database or with how it's stored. But in a normal scenario, all of the above explanations should hold.
And for future reference, if you'll find a database or a language where strings are denoted with quotes, even then you would still not remove them from JSON since any JSON parser needs them to properly identify nodes. And yes, when I say "remove" I mean even if it is temporarily in some sort of query.
The only situation where you might want to remove quotes is when displaying the string values and even then it might be situational since JSON modules would prolly do it automatically for you when you query nodes.
Upvotes: 0