Reputation: 1
I'm working with SQL Presto in Athena and in a table I have a column named "data.input.additional_risk_data.basket" that has a json like this:
[
{
"data.input.additional_risk_data.basket.val.brand":null,
"data.input.additional_risk_data.basket.val.category":null,
"data.input.additional_risk_data.basket.val.item_reference":"26484651",
"data.input.additional_risk_data.basket.val.name":"Nike Force 1",
"data.input.additional_risk_data.basket.val.product_name":null,
"data.input.additional_risk_data.basket.val.published_date":null,
"data.input.additional_risk_data.basket.val.quantity":"1",
"data.input.additional_risk_data.basket.val.size":null,
"data.input.additional_risk_data.basket.val.subCategory":null,
"data.input.additional_risk_data.basket.val.unit_price":769.0,
"data.input.additional_risk_data.basket.val.upc":null,
"data.input.additional_risk_data.basket.val.url":null
}
]
I need to extract some of the data there, for example data.input.additional_risk_data.basket.val.item_reference. I'm not used to working with jsons but I tried a few things:
json_extract("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference')
json_extract_scalar("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference)
They all returned null. I'm wondering what is the correct way to get the values from that json Thank you!
Upvotes: 0
Views: 819
Reputation: 143253
There are multiple "problems" with your data and json path selector. Keys are not conventional (and I have not found a way to tell athena to escape them) and your json is actually an array of json objects. What you can do - cast data to an array and process it. For example:
-- sample data
WITH dataset (json_val) AS (
VALUES (json '[
{
"data.input.additional_risk_data.basket.val.brand":null,
"data.input.additional_risk_data.basket.val.category":null,
"data.input.additional_risk_data.basket.val.item_reference":"26484651",
"data.input.additional_risk_data.basket.val.name":"Nike Force 1",
"data.input.additional_risk_data.basket.val.product_name":null,
"data.input.additional_risk_data.basket.val.published_date":null,
"data.input.additional_risk_data.basket.val.quantity":"1",
"data.input.additional_risk_data.basket.val.size":null,
"data.input.additional_risk_data.basket.val.subCategory":null,
"data.input.additional_risk_data.basket.val.unit_price":769.0,
"data.input.additional_risk_data.basket.val.upc":null,
"data.input.additional_risk_data.basket.val.url":null
}
]')
)
--query
select arr[1]['data.input.additional_risk_data.basket.val.item_reference'] item_reference -- or use unnest if there are actually more than 1 element in array expected
from(
select cast(json_val as array(map(varchar, json))) arr
from dataset
)
Output:
item_reference |
---|
"26484651" |
Upvotes: 0