Daniel Mahler
Daniel Mahler

Reputation: 8193

Impala access to existing Parquet tables in S3

I have some Parquet tables that were created with SparkSQL stored in S3. I would like to also be able use them from Impala. I also have an instance of Impala running on CDH5 that I can access using Hue.

What do I need to do to query the above data from this Impala instance?

The Impala Parquet documentation seems to be primarily about importing data into Parquet. I already have the data in Parquet and I just want to point Impala at it. I am new to Impala and Hue, my experience with Parquet is from SparkSQL.

Upvotes: 1

Views: 1852

Answers (1)

Jeff Hammerbacher
Jeff Hammerbacher

Reputation: 4236

Impala has experimental support querying data stored in S3. Here's an example CREATE TABLE statement for working with Parquet data stored in S3, taken from the documentation linked in the previous sentence:

create table sample_data_s3 (id int, id bigint, val int, zerofill
string, name string, assertion boolean, city string, state string)
stored as parquet location 's3a://impala-demo/sample_data';

Upvotes: 2

Related Questions