Metadata
Metadata

Reputation: 2083

How to query Hive table which has parquet as inputformat?

I have created a hive table as below:

create table parqtab(id int, name char(30), city char(30))
  partitioned by (country char(30))
  row format delimited
  fields terminated by ','
  stored as parquet
  location '/home/hive/practice';

and loaded the below data:

3,Bobby,London
4,Sunny,Amsterdam

using load command:

load data local inpath '/home/cloudera/Desktop/hid' into table parqtab partition(country='abcd');

When I query for select * from parqtab, it is giving me the following error:

Failed with exception java.io.IOException:java.lang.RuntimeException: 
hdfs://quickstart.cloudera:8020/home/hive/practice/country=abcd/hid is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [111, 114, 101, 10]
Time taken: 0.227 seconds

I understood that it is not the right way to query the data which is stored in parquet format. But I don't understand how to do it. Can anyone tell me what is the mistake Im making here and how to properly query the table ?

Upvotes: 0

Views: 3352

Answers (2)

AM_Hawk
AM_Hawk

Reputation: 681

Not sure how you loaded your data but if you have a csv just put that on hdfs. Create an external table over that directory stored as text. create your parquet table and you can do an insert into and hive will then store the resulting data set as parquet.

CREATE EXTERNAL TABLE db_name.tbl0(
col0    INT,
col1    VARCHAR(255)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
LINES TERMINATED BY '\n'
LOCATION '/someDir/tbl0';

CREATE EXTERNAL TABLE db_name.tbl1(
col0    INT,
col1    VARCHAR(255) 
)
STORED AS PARQUET
LOCATION '/someDir/tbl1';
;

INSERT INTO TABLE tbl1
select * from tbl0;

Upvotes: 3

Explorer
Explorer

Reputation: 1647

As @AM_Hawk suggested if you have csv then you should simply use STORED AS TEXTFILE in your create table. Something like below:

create external table parqtab(id int, name char(30), city char(30))
partitioned by (country char(30))  
ROW FORMAT DELIMITED  
FIELDS TERMINATED BY ',' 
STORED AS TEXTFILE 
LOCATION '/home/hive/practice';

The error clearly says that it is not able to find any parquet file.

If you really need Parquet then you can write your hive data as parquet file using Spark like below and create a Hive schema over it as @AM_Hawk suggested in his answer:

val iSql="your hive select statement"
val hiveDF = hsc.sql(iSql)
hiveDF.coalesce(noExecutors).write.mode("append/overwrite").parquet(parquetLoc)

Upvotes: 0

Related Questions