Eric C
Eric C

Reputation: 165

Querying Parquet file in HDFS using Impala

I'm trying to read a parquet file with Impala.

impala-shell> SELECT * FROM `/path/in/hdfs/*.parquet`

I know I can do that using Spark or Drill, but I wonder if it's possible with Impala ?

Thanks

Upvotes: 1

Views: 1801

Answers (1)

thePurplePython
thePurplePython

Reputation: 2767

You would need to create a structured table on top of the parquet files to query via Impala.

General example of external table pointing to parquet directory ... Cloudera docs provide all methods here:

https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet.html#parquet_ddl

CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET '/user/etl/destination/datafile1.dat'
  STORED AS PARQUET
  LOCATION '/user/etl/destination';

Upvotes: 2

Related Questions