Maciej Wawrzyniak
Maciej Wawrzyniak

Reputation: 75

I can't load csv to temporary table HIVE

I wrote the following query:

CREATE TEMPORARY EXTERNAL TABLE IF NOT EXISTS `temp_data`(
  `price` double, 
  `genre` string, 
  `all_genres` string, 
  `languages` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  '/user/abc/data'
TBLPROPERTIES (
  'transient_lastDdlTime'='1588006839');

the last two columns are array list with the following pattern: ['val1', 'val2', 'val3']
no error occurs after starting.

After running this part of the code, everything is ok, but when I add
'SELECT * FROM temp_data'
I get an error:
Failed to fetch next batch for the Resultset org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: org.apache.orc.FileFormatException: Malformed ORC file /user/abc/data/data.csv. Invalid postscript.
Does anyone know how to fix it?

EDIT:
After this:

FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/abc/data'
TBLPROPERTIES ('skip.header.line.count'='1'); 

I have a problem, becouse Select incorrectly returns data because the array has commas

EDIT 2 I managed to solve it by adding:

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar"     = "\"")  
LOCATION '/user/ABC/data'
TBLPROPERTIES ('skip.header.line.count'='1', 'colelction.delim'=',');

Upvotes: 1

Views: 1181

Answers (2)

leftjoin
leftjoin

Reputation: 38335

In addition to what @RamGhadiyaram said, I'd like to add:

If the file is CSV, then define table as TEXTFILE, not ORC.

Instead of this:

ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

Write

STORED AS TEXTFILE

Also the idea of temporary table is that it lives during current session, data is stored in the users scratch directory and is deleted when the session ends. Naturally it is managed table. Better remove EXTERNAL even if hive allows this, or use normal, not temporary table if you want data to be persistent.

Upvotes: 1

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29237

AFAIK since array columns are different, you need to create array data type

arrays: ARRAY<data_type>

like this a

CREATE TEMPORARY EXTERNAL TABLE IF NOT EXISTS `temp_data`(
  `price` double, 
  `genre` string, 
  `all_genres` array<string>, 
  `languages` array<string>) ... remaining as it is.

other wise I think its not able to determine thease array columns.

Working With Hive Complex Data Types

Upvotes: 1

Related Questions