Reputation: 49
Below are given my S3 paths under which multiple folders are present. Each folder contains a CSV file each with a different schema.
The values within the curly braces {} will be dynamic.
s3://test_bucket/{val1}/data/{val2}/input/latest/
s3://test_bucket/{val1}/data/{val2}/input/archived/timestamp={val3}/
I want to create the Athena tables using AWS Glue Crawler. We can have a separate database for input_data both for current and archive.
The tables formed should be such that it's partitioned over val1 and val2 both for the current and archive. And, an additional partition should be present in the table, that is, val3, in the case of the archived.
Kindly help me with any approach I can take to set the configuration for creating tables dynamically. I would really appreciate your time. Please let me know in case more information is needed.
Upvotes: 0
Views: 841
Reputation: 79
My comment, use the api to create the crawlers with the specific s3 paths to read, and the database name to write.
Upvotes: 0
Reputation: 1305
the simplest and most efficient way would be to use partition projection. Ser the docs: https://docs.aws.amazon.com/athena/latest/ug/partition-projection.html
Upvotes: 1