Reputation: 16044
I crawled data using aws glue to import json data from an s3 folder that contains data where the root braces is an array like this:
[{id: '1', name: 'rick'},{id: '2', name: 'morty'}]
This ends up resulting in a schema like this:
array<struct<expand:string,id:string,name:string>>
How do I query by name
in Athena?
If I try this:
SELECT * FROM people_s3_buckets WHERE name = "rick";
I get the following error:
SYNTAX_ERROR: Column 'name' cannot be resolved
Perhaps, there is a way to setup the Glue crawler to add just the elements within the array and avoid the nesting alltogether?
Upvotes: 12
Views: 26576
Reputation: 2668
In order to query fields of elements within an array, you would need to UNNEST
it first. Assuming that structure array<struct<expand:string,id:string,name:string>>
corresponds to column members
, you would need to do
SELECT
*
FROM
people_s3_buckets,
UNNEST(members) as t(member)
WHERE
member.name = 'rick'
Note, you need to use single quotes instead fo double quotes.
Here is the official AWS docs on handling arrays in AWS Athena: Querying Arrays
Upvotes: 19