newleaf
newleaf

Reputation: 2457

athena column contains list type data

Have a Athena table like this:

userid    visited
001a       ['New York','Los Angeles']
002b       ['Atlanta']
0030       ['New York','Paris','Albany']
....

Not quite sure how to query such list type column. How could I find

  1. users who visited 'New York', desired:
0001a 
00030
  1. all distinct cities visited? desired output:
'New York'
'Los Angeles'
'Paris'
'Atlanta'
'Albany'

Upvotes: 1

Views: 2088

Answers (1)

Guru Stron
Guru Stron

Reputation: 143088

Assuming that visited is array type from presto:

WITH dataset AS (
    SELECT *
    FROM (
             VALUES ('001a',       ARRAY['New York','Los Angeles']),
                    ('002b',       ARRAY['Atlanta', 'Los Angeles'])
         ) AS t (id, visited))

For the first one use contains function:

select id
from dataset
where contains(visited, 'Atlanta'); -- returns 002b

for the second one you can use set_union function, but it is not available for me in Athena (or I'm using it wrong =), so you can workaround with:

select array_distinct(flatten(array_agg(visited)))
from dataset; -- returns `{New York,Los Angeles,Atlanta}`

If visited is a string you can turn it into array via casting it to JSON and then to array.

Upvotes: 1

Related Questions