Reputation: 25032
Given an original table in Athena that is similar to this contrived example below:
Original Table:
a | b | cmsmessage | c |
---|---|---|---|
... | ... | {"asset": {"metadata": {"item": [{"key": "author", "value": "Rob"}, { "key": "id", "value": "123"}, {"key": "version", "value": "1"}]}}} | ... |
... | ... | {"asset": {"metadata": {"item": [{ "key": "id", "value": "456"}, {"key": "author", "value": "John"}, {"key": "version", "value": "3"}]}}} | ... |
... | ... | {"asset": {"metadata": {"item": [{"key": "version", "value": "2"}, {"key": "author", "value": "Sally"}, {"key": "id", "value": "789" }]}}} | ... |
I am trying to create a View from the stringified JSON in the cmsmessage
column:
Desired Resultant View:
id | author | version |
---|---|---|
123 | Rob | 1 |
456 | John | 3 |
789 | Sally | 3 |
As you can see in the "Desired Resultant View" I'm essentially wanting to extract the data from each object of the item
array based on the value of the key
value.
[{"key": "author", "value": "Rob"}, { "key": "id", "value": "123"}, {"key": "version", "value": "1"}]
I've seen/tried various examples that access items in an array by index, e.g. [0]
such as the examples in Athena docs. However the index position in the array may vary given the json/string in the cmsmessage
column of the original table.
My Failed attempt:
Below shows my failed attempt:
WITH dataset AS
(
SELECT *
FROM (VALUES
('{"asset": {"metadata": {"item": [{"key": "author", "value": "Rob"}, { "key": "id", "value": "123"}, {"key": "version", "value": "1"}]}}}'),
('{"asset": {"metadata": {"item": [{ "key": "id", "value": "456"}, {"key": "author", "value": "John"}, {"key": "version", "value": "3"}]}}}'),
('{"asset": {"metadata": {"item": [{"key": "version", "value": "2"}, {"key": "author", "value": "Sally"}, {"key": "id", "value": "789" }]}}}')
) AS t (cmsmessage)
)
SELECT
json_extract_scalar(objArray, '$.key') as _keys,
json_extract_scalar(objArray, '$.value') as _values
FROM dataset
CROSS JOIN UNNEST(CAST(json_extract(cmsmessage, '$.asset.metadata.item') as array(json))) as t (objArray)
Upvotes: 1
Views: 79
Reputation: 143003
Arguably the easiest option would be using json_query
which has better support for JSON path syntax:
SELECT json_query(cmsmessage, 'lax $.asset.metadata.item[*]?(@.key=="id").value') AS id
, json_query(cmsmessage, 'lax $.asset.metadata.item[*]?(@.key=="author").value') AS author
, json_query(cmsmessage, 'lax $.asset.metadata.item[*]?(@.key=="version").value') AS version
FROM dataset;
Should work for Athena engine version 3 which is based on Trino.
For version 2 you can achieve similar with array manipulation.
Upvotes: 1