mrc
mrc

Reputation: 3173

How to query an array field (AWS Glue)?

I have a table in AWS Glue, and the crawler has defined one field as array. The content is in S3 files that have a json format. The table is TableA, and the field is members.

There are a lot of other fields such as strings, booleans, doubles, and even structs.

I am able to query them all using a simpel query such as:

SELECT
    content.my_boolean,
    content.my_string,
    content.my_struct.value
FROM schema.tableA;

The issue is when I add content.members into the query. The error I get is: [Amazon](500310) Invalid operation: schema "content" does not exist.

Content exists because i am able to select other fiels from the main key in the json (content). Probably is something related with how to perform the query agains array field in Spectrum.

Any idea?

Upvotes: 3

Views: 1586

Answers (2)

KoalaKey
KoalaKey

Reputation: 242

You need to create a Glue Classifier.

Select JSON as Classifier type

and for the JSON Path input the following:

$[*]

then run your crawler. It will infer your schema and populate your table with the correct fields instead of just one big array. Not sure if this was what you were looking for but figured I'd drop this here just in case others had the same problem I had.

Upvotes: 0

Hyruma92
Hyruma92

Reputation: 876

You have to rename the table to extract the fields from the external schema:

SELECT
    a.content.my_boolean,
    a.content.my_string,
    a.content.my_struct.value
FROM schema.tableA a;

I had the same issue on my data, I really don't know why it needs this cast but it works. If you need to access elements of an array you have to explod it like:

SELECT member.<your-field>,
FROM schema.tableA a, a.content.members as member;

Reference

Upvotes: 3

Related Questions