NetanelRabinowitz
NetanelRabinowitz

Reputation: 1604

Athena create table from parquet schema

Is there a way to create a table in Amazon Athena directly from parquet file based on avro schema? The schema is encoded into the file so its seems stupid that I need to actually create the DDL myself.

I saw this and also another duplication

but they are related directly to Hive, it wont work for Athena. Ideally I am looking for a way to do it programmatically without the need to define it at the console.

Upvotes: 16

Views: 10775

Answers (2)

Steve McKay
Steve McKay

Reputation: 2223

This is now more-or-less possible using AWS Glue. Glue can crawl a bunch of different data sources, including Parquet files on S3. Discovered tables are added to the Glue data catalog and queryable from Athena. Depending on your needs, you could schedule a Glue crawler to run periodically, or you could define and run a crawler using the Glue API.

If you have many separate hunks of data that share a schema, you can also use a partitioned table to reduce the overhead of making new loads available to Athena. For example, I have some daily dumps that load into tables partitioned by date. As long as the schema doesn't change, all you then need to do is MSCK REPAIR TABLE.

Upvotes: 8

andresp
andresp

Reputation: 1654

It doesn't seem to be possible with Athena as avro.schema.url is not a supported property.

table property 'avro.schema.url' is not supported. (Service: AmazonAthena; Status Code: 400; Error Code: InvalidRequestException...)

You can use avro.schema.literal (you would have to copy the avro json schema to the query) but I still experienced problems querying the data afterwards.

Strange errors like: SYNTAX_ERROR: line 1:8: SELECT * not allowed in queries without FROM clause

Upvotes: 2

Related Questions