Unable to directly load hive parquet table using spark dataframe

Question

I have gone thru related posts available in SO and couldn't find this specific issue anywhere over the internet.

I am trying to load Hive table (Hive external table pointed to parquet files) but spark data frame couldn't read the data and it is just able to read schema. But for the same hive table i can query from hive shell. When i try to load hive table into dataframe it is not returning any data. Below is my script looks like and the DDL. I am using Spark 2.1 (Mapr distribution)

Unable to read data from hive table has underlying parquet files from spark

val df4 = spark.sql("select * from default.Tablename")
scala> df4.show()
+----------------------+------------------------+----------+---+-------------+-------------+---------+
|col1           |col2           |col3                       |key            |col4|  record_status|source_cd|
+----------------------+------------------------+----------+---+-------------+-------------+---------+
+----------------------+------------------------+----------+---+-------------+-------------+---------+

Hive DDL
CREATE EXTERNAL TABLE `Tablename`(
  `col1` string,
  `col2` string,
  `col3` decimal(19,0),
  `key` string,
  `col6` string,
  `record_status` string,
  `source_cd` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'path'='maprfs:abc/bds/dbname.db/Tablename')
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'maprfs:/Datalocation/Tablename'
TBLPROPERTIES (
  'numFiles'='2',
  'spark.sql.sources.provider'='parquet',
  'spark.sql.sources.schema.numParts'='1',
  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"col1\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"col2\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"col3\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"key\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"col6\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"record_status\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"source_cd\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}',
  'totalSize'='68216',
  'transient_lastDdlTime'='1502904476')

user1997656 · Accepted Answer

remove 'spark.sql.sources.provider'='parquet' and you will success

Unable to directly load hive parquet table using spark dataframe

Answers (1)

Related Questions