Reputation: 3548

How to load CSV data with enclosed by double quotes and separated by tab into HIVE table?

I am trying to load data from a csv file in which the values are enclosed by double quotes '"' and tab separated '\t' . But when I try to load that into hive its not throwing any error and data is loaded without any error but I think all the data is getting loaded into a single column and most of the values it showing as NULL. below is my create table statement.

CREATE TABLE example
(
organization  STRING,
order BIGINT,
created_on  TIMESTAMP,
issue_date TIMESTAMP,
qty  INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t' 
ESCAPED BY '"'
STORED AS TEXTFILE;

Input file sample;-

 "Organization" "Order"  "Created on"   "issue_date"   "qty"
 "GB"   "111223"    "2015/02/06 00:00:00"   "2015/05/15 00:00:00"   "5"
 "UK"   "1110"  "2015/05/06 00:00:00"   "2015/06/1 00:00:00"   "51"

and Load statement to push data into hive table.

 LOAD DATA INPATH '/user/example.csv' OVERWRITE INTO TABLE example

What could be the issue and how can I ignore header of the file. and if I remove ESCAPED BY '"' from create statement its loading in respective columns but all the values are enclosed by double quotes. How can I remove double quotes from values and ignore header of the file?

Upvotes: 23

Answers (5)

cheseaux

Reputation: 5325

You can now use OpenCSVSerde which allows you to define the separator character and easily escape surrounding double-quotes :

CREATE EXTERNAL TABLE example (
   organization  STRING,
   order BIGINT,
   created_on  TIMESTAMP,
   issue_date TIMESTAMP,
   qty  INT
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "\""
)  
LOCATION '/your/folder/location/';

Upvotes: 28

user584583

Reputation: 1280

"Hive now includes an OpenCSVSerde which will properly parse those quoted fields without adding additional jars or error prone and slow regex."

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

source = Ben Doerr

How to handle fields enclosed within quotes(CSV) in importing data from S3 into DynamoDB using EMR/Hive

Upvotes: 0

Abhishek

Reputation: 7045

Use CSV Serde to create the table. I've created a table in hive as follows, and it works like charm.

CREATE EXTERNAL TABLE IF NOT EXISTS myTable (
id STRING,
url STRING,
name STRING
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties ("separatorChar" = "\t")
LOCATION '<folder location>';

Upvotes: 0

prasannads

Reputation: 659

You can use a CSV serde " csv-serde-1.1.2.jar " to load the file without double quotes.

download link:

http://ogrodnek.github.io/csv-serde/

and the create table statement as

CREATE TABLE <table_name> (col_name_1 type1, col_name_2 type2, ...) row format serde 'com.bizo.hive.serde.csv.CSVSerde';

you can remove the header with the following property in the create table stmt

tblproperties ("skip.header.line.count"="1");

Upvotes: -2

maxymoo

Reputation: 36555

You don't want to use escaped by, that's for escape characters, not quote characters. I don't think that Hive actually has support for quote characters. You might want to take a look at this csv serde which accepts a quotechar property.

Also if you have HUE, you can use the metastore manager webapp to load the CSV in, this will deal with the header row, column datatypes and so on.

Upvotes: 2

How to load CSV data with enclosed by double quotes and separated by tab into HIVE table?

Answers (5)

Related Questions