Impala table from spark partitioned parquet files

Question

I have generated some partitioned parquet data using Spark, and I'm wondering how to map it to an Impala table... Sadly, I haven't found any solution yet.

The schema of parquet is like :

{ key: long,
value: string,
date: long }

and I partitioned it with key and date, that gives me this kind of directories on my hdfs :

/data/key=1/date=20170101/files.parquet
/data/key=1/date=20170102/files.parquet
/data/key=2/date=20170101/files.parquet
/data/key=2/date=20170102/files.parquet
...

Do you know how I could tell Impala to create a table from this dataset with corresponding partitions (and without having to loop on each partition as I could have read) ? Is it possible ?

Thank you in advance

Impala table from spark partitioned parquet files

Answers (1)

Related Questions