RobC
RobC

Reputation: 25032

Building a view from a JSON stringified array of objects in the original table column

Given an original table in Athena that is similar to this contrived example below:

Original Table:

a b cmsmessage c
... ... {"asset": {"metadata": {"item": [{"key": "author", "value": "Rob"}, { "key": "id", "value": "123"}, {"key": "version", "value": "1"}]}}} ...
... ... {"asset": {"metadata": {"item": [{ "key": "id", "value": "456"}, {"key": "author", "value": "John"}, {"key": "version", "value": "3"}]}}} ...
... ... {"asset": {"metadata": {"item": [{"key": "version", "value": "2"}, {"key": "author", "value": "Sally"}, {"key": "id", "value": "789" }]}}} ...

I am trying to create a View from the stringified JSON in the cmsmessage column:

Desired Resultant View:

id author version
123 Rob 1
456 John 3
789 Sally 3

As you can see in the "Desired Resultant View" I'm essentially wanting to extract the data from each object of the item array based on the value of the key value.

[{"key": "author", "value": "Rob"}, { "key": "id", "value": "123"}, {"key": "version", "value": "1"}]


I've seen/tried various examples that access items in an array by index, e.g. [0] such as the examples in Athena docs. However the index position in the array may vary given the json/string in the cmsmessage column of the original table.


My Failed attempt:

Below shows my failed attempt:

WITH dataset AS 
(
   SELECT *
   FROM (VALUES 
     ('{"asset": {"metadata": {"item": [{"key": "author", "value": "Rob"}, { "key": "id", "value": "123"}, {"key": "version", "value": "1"}]}}}'),
     ('{"asset": {"metadata": {"item": [{ "key": "id", "value": "456"}, {"key": "author", "value": "John"}, {"key": "version", "value": "3"}]}}}'),
     ('{"asset": {"metadata": {"item": [{"key": "version", "value": "2"}, {"key": "author", "value": "Sally"}, {"key": "id", "value": "789" }]}}}')
  ) AS t (cmsmessage)
)

SELECT
    json_extract_scalar(objArray, '$.key') as _keys,
    json_extract_scalar(objArray, '$.value') as _values
FROM dataset

CROSS JOIN UNNEST(CAST(json_extract(cmsmessage, '$.asset.metadata.item') as array(json))) as t (objArray)

Upvotes: 1

Views: 79

Answers (1)

Guru Stron
Guru Stron

Reputation: 143003

Arguably the easiest option would be using json_query which has better support for JSON path syntax:

SELECT json_query(cmsmessage, 'lax $.asset.metadata.item[*]?(@.key=="id").value') AS id
    , json_query(cmsmessage, 'lax $.asset.metadata.item[*]?(@.key=="author").value') AS author
    , json_query(cmsmessage, 'lax $.asset.metadata.item[*]?(@.key=="version").value') AS version
FROM dataset;

Should work for Athena engine version 3 which is based on Trino.

For version 2 you can achieve similar with array manipulation.

Upvotes: 1

Related Questions