Reputation: 75
I wrote the following query:
CREATE TEMPORARY EXTERNAL TABLE IF NOT EXISTS `temp_data`(
`price` double,
`genre` string,
`all_genres` string,
`languages` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'/user/abc/data'
TBLPROPERTIES (
'transient_lastDdlTime'='1588006839');
the last two columns are array list with the following pattern: ['val1', 'val2', 'val3']
no error occurs after starting.
After running this part of the code, everything is ok, but when I add
'SELECT * FROM temp_data'
I get an error:
Failed to fetch next batch for the Resultset
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: org.apache.orc.FileFormatException: Malformed ORC file /user/abc/data/data.csv. Invalid postscript.
Does anyone know how to fix it?
EDIT:
After this:
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/abc/data'
TBLPROPERTIES ('skip.header.line.count'='1');
I have a problem, becouse Select incorrectly returns data because the array has commas
EDIT 2 I managed to solve it by adding:
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = "\"")
LOCATION '/user/ABC/data'
TBLPROPERTIES ('skip.header.line.count'='1', 'colelction.delim'=',');
Upvotes: 1
Views: 1181
Reputation: 38335
In addition to what @RamGhadiyaram said, I'd like to add:
If the file is CSV, then define table as TEXTFILE, not ORC.
Instead of this:
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
Write
STORED AS TEXTFILE
Also the idea of temporary table is that it lives during current session, data is stored in the users scratch directory and is deleted when the session ends. Naturally it is managed table. Better remove EXTERNAL even if hive allows this, or use normal, not temporary table if you want data to be persistent.
Upvotes: 1
Reputation: 29237
AFAIK since array columns are different, you need to create array data type
arrays: ARRAY<data_type>
like this a
CREATE TEMPORARY EXTERNAL TABLE IF NOT EXISTS `temp_data`(
`price` double,
`genre` string,
`all_genres` array<string>,
`languages` array<string>) ... remaining as it is.
other wise I think its not able to determine thease array columns.
Working With Hive Complex Data Types
Upvotes: 1