Francisco
Francisco

Reputation: 1

Extracting JSON returns null (Presto Athena)

I'm working with SQL Presto in Athena and in a table I have a column named "data.input.additional_risk_data.basket" that has a json like this:

[
   {
      "data.input.additional_risk_data.basket.val.brand":null,
      "data.input.additional_risk_data.basket.val.category":null,
      "data.input.additional_risk_data.basket.val.item_reference":"26484651",
      "data.input.additional_risk_data.basket.val.name":"Nike Force 1",
      "data.input.additional_risk_data.basket.val.product_name":null,
      "data.input.additional_risk_data.basket.val.published_date":null,
      "data.input.additional_risk_data.basket.val.quantity":"1",
      "data.input.additional_risk_data.basket.val.size":null,
      "data.input.additional_risk_data.basket.val.subCategory":null,
      "data.input.additional_risk_data.basket.val.unit_price":769.0,
      "data.input.additional_risk_data.basket.val.upc":null,
      "data.input.additional_risk_data.basket.val.url":null
   }
]

I need to extract some of the data there, for example data.input.additional_risk_data.basket.val.item_reference. I'm not used to working with jsons but I tried a few things:

json_extract("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference')
json_extract_scalar("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference)

They all returned null. I'm wondering what is the correct way to get the values from that json Thank you!

Upvotes: 0

Views: 819

Answers (1)

Guru Stron
Guru Stron

Reputation: 143253

There are multiple "problems" with your data and json path selector. Keys are not conventional (and I have not found a way to tell athena to escape them) and your json is actually an array of json objects. What you can do - cast data to an array and process it. For example:

-- sample data
WITH dataset (json_val) AS (
    VALUES  (json '[
   {
      "data.input.additional_risk_data.basket.val.brand":null,
      "data.input.additional_risk_data.basket.val.category":null,
      "data.input.additional_risk_data.basket.val.item_reference":"26484651",
      "data.input.additional_risk_data.basket.val.name":"Nike Force 1",
      "data.input.additional_risk_data.basket.val.product_name":null,
      "data.input.additional_risk_data.basket.val.published_date":null,
      "data.input.additional_risk_data.basket.val.quantity":"1",
      "data.input.additional_risk_data.basket.val.size":null,
      "data.input.additional_risk_data.basket.val.subCategory":null,
      "data.input.additional_risk_data.basket.val.unit_price":769.0,
      "data.input.additional_risk_data.basket.val.upc":null,
      "data.input.additional_risk_data.basket.val.url":null
   }
]')

) 

--query
select arr[1]['data.input.additional_risk_data.basket.val.item_reference'] item_reference -- or use unnest if there are actually more than 1 element in array expected
from(
    select cast(json_val as array(map(varchar, json))) arr
    from dataset
)

Output:

item_reference
"26484651"

Upvotes: 0

Related Questions