rongenre
rongenre

Reputation: 1334

Exporting Spark Dataframe to Athena

I'm running a pyspark job which creates a dataframe and stores it to S3 as below:

df.write.saveAsTable(table_name, format="orc", mode="overwrite", path=s3_path)

I can read the orcfile without a problem, just by using spark.read.orc(s3_path), so there's schema information in the orcfile, as expected.

However, I'd really like to view the dataframe contents using Athena. Clearly if I wrote to my hive metastore, I can call hive and do a show create table ${table_name}, but that's a lot of work when all I want is a simple schema.

Is there another way?

Upvotes: 2

Views: 3561

Answers (1)

Al Belsky
Al Belsky

Reputation: 1592

One of the approaches would be to set up a Glue crawler for your S3 path, which would create a table in the AWS Glue Data Catalog. Alternatively, you could create the Glue table definition via the Glue API.

The AWS Glue Data Catalog is fully integrated with Athena, so you would see your Glue table in Athena, and be able to query it directly: http://docs.aws.amazon.com/athena/latest/ug/glue-athena.html

Upvotes: 0

Related Questions