How to query Hive table which has parquet as inputformat?

Question

I have created a hive table as below:

create table parqtab(id int, name char(30), city char(30))
  partitioned by (country char(30))
  row format delimited
  fields terminated by ','
  stored as parquet
  location '/home/hive/practice';

and loaded the below data:

3,Bobby,London
4,Sunny,Amsterdam

using load command:

load data local inpath '/home/cloudera/Desktop/hid' into table parqtab partition(country='abcd');

When I query for select * from parqtab, it is giving me the following error:

Failed with exception java.io.IOException:java.lang.RuntimeException: 
hdfs://quickstart.cloudera:8020/home/hive/practice/country=abcd/hid is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [111, 114, 101, 10]
Time taken: 0.227 seconds

I understood that it is not the right way to query the data which is stored in parquet format. But I don't understand how to do it. Can anyone tell me what is the mistake Im making here and how to properly query the table ?

AM_Hawk · Accepted Answer

Not sure how you loaded your data but if you have a csv just put that on hdfs. Create an external table over that directory stored as text. create your parquet table and you can do an insert into and hive will then store the resulting data set as parquet.

CREATE EXTERNAL TABLE db_name.tbl0(
col0    INT,
col1    VARCHAR(255)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
LINES TERMINATED BY '
'
LOCATION '/someDir/tbl0';

CREATE EXTERNAL TABLE db_name.tbl1(
col0    INT,
col1    VARCHAR(255) 
)
STORED AS PARQUET
LOCATION '/someDir/tbl1';
;

INSERT INTO TABLE tbl1
select * from tbl0;

How to query Hive table which has parquet as inputformat?

Answers (2)

Related Questions