VoY
VoY

Reputation: 5699

How to extract a field from an array of JSON objects in AWS Athena?

I have the following JSON data structure in a column in AWS Athena:

[
  {
    "event_type": "application_state_transition",
    "data": {
      "event_id": "-3368023833341021830"
    }
  },
  {
    "event_type": "application_state_transition",
    "data": {
      "event_id": "5692882176024811076"
    }
  }
]

I would like to somehow extract the values of event_id field, e.g. in the form of a list:

["-3368023833341021830", "5692882176024811076"]

(Though I don't insist on exactly this as long as I can get my event IDs.)

I wanted to use the JSON_EXTRACT function and thought it uses the very same syntax as jq. In jq, I can easily get what I want using the following query syntax:

.[].data.event_id

However, in AWS Athena this results in an error, as apparently the syntax is not entirely compatible with jq. Is there an alternative way to achieve the result I want?

Upvotes: 0

Views: 4164

Answers (1)

Guru Stron
Guru Stron

Reputation: 143243

JSON_EXTRACT supports quite limited set of json paths. Depending on Athena engine version you can either process column by casting it to array of maps and processing this array via array functions:

-- sample data
with dataset(json_col) as (
    values ('[
  {
    "event_type": "application_state_transition",
    "data": {
      "event_id": "-3368023833341021830"
    }
  },
  {
    "event_type": "application_state_transition",
    "data": {
      "event_id": "5692882176024811076"
    }
  }
]')
)

-- query
select transform(
    cast(json_parse(json_col) as array(map(varchar, json))),
    m -> json_extract(m['data'], '$.event_id'))
from dataset;

Output:

_col0
["-3368023833341021830", "5692882176024811076"]

Or for 3rd Athena engine version you can try using Trino's json_query:

-- query
select JSON_QUERY(json_col, 'lax $[*].data.event_id' WITH ARRAY WRAPPER)
from dataset;

Note that return type of two will differ - in first case you will have array(json) and in the second one - just varchar.

Upvotes: 2

Related Questions