Reputation: 5699
I have the following JSON data structure in a column in AWS Athena:
[
{
"event_type": "application_state_transition",
"data": {
"event_id": "-3368023833341021830"
}
},
{
"event_type": "application_state_transition",
"data": {
"event_id": "5692882176024811076"
}
}
]
I would like to somehow extract the values of event_id
field, e.g. in the form of a list:
["-3368023833341021830", "5692882176024811076"]
(Though I don't insist on exactly this as long as I can get my event IDs.)
I wanted to use the JSON_EXTRACT
function and thought it uses the very same syntax as jq
. In jq
, I can easily get what I want using the following query syntax:
.[].data.event_id
However, in AWS Athena this results in an error, as apparently the syntax is not entirely compatible with jq
. Is there an alternative way to achieve the result I want?
Upvotes: 0
Views: 4164
Reputation: 143243
JSON_EXTRACT
supports quite limited set of json paths. Depending on Athena engine version you can either process column by casting it to array of maps and processing this array via array functions:
-- sample data
with dataset(json_col) as (
values ('[
{
"event_type": "application_state_transition",
"data": {
"event_id": "-3368023833341021830"
}
},
{
"event_type": "application_state_transition",
"data": {
"event_id": "5692882176024811076"
}
}
]')
)
-- query
select transform(
cast(json_parse(json_col) as array(map(varchar, json))),
m -> json_extract(m['data'], '$.event_id'))
from dataset;
Output:
_col0 |
---|
["-3368023833341021830", "5692882176024811076"] |
Or for 3rd Athena engine version you can try using Trino's json_query
:
-- query
select JSON_QUERY(json_col, 'lax $[*].data.event_id' WITH ARRAY WRAPPER)
from dataset;
Note that return type of two will differ - in first case you will have array(json)
and in the second one - just varchar
.
Upvotes: 2