Sharad
Sharad

Reputation: 3548

How to load CSV data with enclosed by double quotes and separated by tab into HIVE table?

I am trying to load data from a csv file in which the values are enclosed by double quotes '"' and tab separated '\t' . But when I try to load that into hive its not throwing any error and data is loaded without any error but I think all the data is getting loaded into a single column and most of the values it showing as NULL. below is my create table statement.

CREATE TABLE example
(
organization  STRING,
order BIGINT,
created_on  TIMESTAMP,
issue_date TIMESTAMP,
qty  INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t' 
ESCAPED BY '"'
STORED AS TEXTFILE;

Input file sample;-

 "Organization" "Order"  "Created on"   "issue_date"   "qty"
 "GB"   "111223"    "2015/02/06 00:00:00"   "2015/05/15 00:00:00"   "5"
 "UK"   "1110"  "2015/05/06 00:00:00"   "2015/06/1 00:00:00"   "51"

and Load statement to push data into hive table.

 LOAD DATA INPATH '/user/example.csv' OVERWRITE INTO TABLE example

What could be the issue and how can I ignore header of the file. and if I remove ESCAPED BY '"' from create statement its loading in respective columns but all the values are enclosed by double quotes. How can I remove double quotes from values and ignore header of the file?

Upvotes: 23

Views: 69031

Answers (5)

cheseaux
cheseaux

Reputation: 5315

You can now use OpenCSVSerde which allows you to define the separator character and easily escape surrounding double-quotes :

CREATE EXTERNAL TABLE example (
   organization  STRING,
   order BIGINT,
   created_on  TIMESTAMP,
   issue_date TIMESTAMP,
   qty  INT
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "\""
)  
LOCATION '/your/folder/location/';

Upvotes: 28

user584583
user584583

Reputation: 1280

"Hive now includes an OpenCSVSerde which will properly parse those quoted fields without adding additional jars or error prone and slow regex."

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

source = Ben Doerr

How to handle fields enclosed within quotes(CSV) in importing data from S3 into DynamoDB using EMR/Hive

Upvotes: 0

Abhishek
Abhishek

Reputation: 7035

Use CSV Serde to create the table. I've created a table in hive as follows, and it works like charm.

CREATE EXTERNAL TABLE IF NOT EXISTS myTable (
id STRING,
url STRING,
name STRING
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties ("separatorChar" = "\t")
LOCATION '<folder location>';

Upvotes: 0

prasannads
prasannads

Reputation: 649

You can use a CSV serde " csv-serde-1.1.2.jar " to load the file without double quotes.

download link:

http://ogrodnek.github.io/csv-serde/

and the create table statement as

CREATE TABLE <table_name> (col_name_1 type1, col_name_2 type2, ...) row format serde 'com.bizo.hive.serde.csv.CSVSerde';

you can remove the header with the following property in the create table stmt

tblproperties ("skip.header.line.count"="1");

Upvotes: -2

maxymoo
maxymoo

Reputation: 36545

You don't want to use escaped by, that's for escape characters, not quote characters. I don't think that Hive actually has support for quote characters. You might want to take a look at this csv serde which accepts a quotechar property.

Also if you have HUE, you can use the metastore manager webapp to load the CSV in, this will deal with the header row, column datatypes and so on.

Upvotes: 2

Related Questions