Philipp Johannis
Philipp Johannis

Reputation: 2956

Transform JSON to to ARRAY<MAP> in Athena/Presto

I have a table available in Athena which has one column with JSON structured as following:

{
    "455a9410-29a8-48a3-ad22-345afa3cd295":
    {
        "legacy_id": 1599677886,
        "w_ids":
        [
            "845254682",
            "831189092"
        ]
    },
    "5e74c911-0b63-4b84-8ad4-77dd9bed7b53":
    {
        "legacy_id": 1599707069,
        "w_ids":
        [
            "1032024432"
        ]
    },
    "7b988890-20ff-4279-94df-198369a58848":
    {
        "legacy_id": 1601097861,
        "w_ids":
        [
            "1032024432"
        ]
    }
}

I would like to convert this into an ARRAY in the following format:

[
    {"new_id"="455a9410-29a8-48a3-ad22-345afa3cd295","legacy_id"=1599677886,"w_ids"=["845254682","831189092"]},
    {"new_id"="5e74c911-0b63-4b84-8ad4-77dd9bed7b53","legacy_id"=1599707069,"w_ids"=["1032024432"]},
    {"new_id"="7b988890-20ff-4279-94df-198369a58848","legacy_id"=1601097861,"w_ids"=["1032024432"]}
]

I already was able to extract legacy_idand w_ids with the following statement but I struggle to add the original key as value:

 with example_data as
 (
     select * from (
        VALUES('{    "455a9410-29a8-48a3-ad22-345afa3cd295":    {        "legacy_id": 1599677886,        "w_ids":        [            "845254682",            "831189092"        ]    },    "5e74c911-0b63-4b84-8ad4-77dd9bed7b53":    {        "legacy_id": 1599707069,        "w_ids":        [            "1032024432"        ]    },    "7b988890-20ff-4279-94df-198369a58848":    {        "legacy_id": 1601097861,        "w_ids":        [            "1032024432"        ]    }}')
     ) as t(col)
 )
select *
,transform(map_values(cast(json_parse(col) AS map(varchar, json))),entry -> MAP_FROM_ENTRIES(ARRAY[('legacy_id',json_extract(entry,'$.legacy_id')),('w_ids',json_extract(entry,'$.w_ids'))]))
from example_data;

Upvotes: 2

Views: 2546

Answers (1)

Guru Stron
Guru Stron

Reputation: 143243

One approach can be using map_values over transform_values instead of transform over map_values:

select map_values(
        transform_values(
            cast(json_parse(col) AS map(varchar, json)),
            (key, entry)->MAP_FROM_ENTRIES(
                ARRAY [('new_id', cast(key as json)),
                ('legacy_id', json_extract(entry, '$.legacy_id')),
                ('w_ids', json_extract(entry, '$.w_ids')) ]
            )
        )
    )
from example_data;

Output:

_col0
[{new_id='455a9410-29a8-48a3-ad22-345afa3cd295', legacy_id=1599677886, w_ids=['845254682','831189092']}, {new_id='5e74c911-0b63-4b84-8ad4-77dd9bed7b53', legacy_id=1599707069, w_ids=['1032024432']}, {new_id='7b988890-20ff-4279-94df-198369a58848', legacy_id=1601097861, w_ids=['1032024432']}]

Upvotes: 2

Related Questions