Hive external tables not able to see partitioned Parquet files

Question

I am generating Parquet files (partitioned by setid, using Snappy compression) using Spark and storing them at a HDFS location.

df.coalesce(1).write.partitionBy("SetId").
  mode(SaveMode.Overwrite).
  format("parquet").
  option("header","true").
  save(args(1))

Parquet Data file is stored under /some-hdfs-path/testsp

I then create the Hive table for it as follows:

CREATE EXTERNAL TABLE DimCompany(
  CompanyCode string,
  CompanyShortName string,
  CompanyDescription string,
  BusinessDate string,
  PeriodTypeInd string,
  IrisDuplicateFlag int,
  GenTimestamp timestamp
) partitioned by (SetId int)
STORED AS PARQUET LOCATION '/some-hdfs-path/testsp'
TBLPROPERTIES ('skip.header.line.count'='1','parquet.compress'='snappy');

However, when I select on table in Hive, it doesn't show any results.

I tried:

running msck command like:
```
msck repair table dimcompany;
```

setting the following:

spark.sql("SET spark.sql.hive.convertMetastoreParquet=false")

None of those are working, how can I solve this?

Hive external tables not able to see partitioned Parquet files

Answers (1)

Related Questions