Bhushan Deopujari
Bhushan Deopujari

Reputation: 1

Load data in CSV format in Hive Table

I am trying to store the following data in a csv file into Hive table but not able to do it successfully

Ann, 78%,7,  
Beth,81%,5,  
Cathy,83%,2,  

The data is present in CSV file. I created the table in Hive using below definition:

Hive> CREATE TABLE test1 (Name String, Perc String, Rank String)  
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'  
WITH SERDEPROPERTIES ("input.regex" = "^(\w+)\,(\w+)\%\,(\w+)$",  
"output.format.string" = "%1$s %2$s %3$s" )
STORED AS TEXTFILE;"      
ok
hive> load data local inpath '/tmp/input.csv' into table test1;  
ok  
hive> Select * from test1;  
ok  

Name    Perc    Rank  
Null    Null    Null  
Null    Null    Null  
Null    Null    Null  

I am not able to figure out the mistake. The resulting data is not getting loaded into the table.

Upvotes: 0

Views: 3538

Answers (4)

Nirmal
Nirmal

Reputation: 9549

Use the OpenCSVSerde if you need flexibility.

CREATE EXTERNAL TABLE `mydb`.`mytable`(
    `product_name` string,
    `brand_id` string,
    `brand` string,
    `color` string,
    `description` string,
    `sale_price` string)
PARTITIONED BY (
    `seller_id` string)
ROW FORMAT SERDE
    'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
    'separatorChar' = '\t',
    'quoteChar' = '"',
    'escapeChar' = '\\')
STORED AS INPUTFORMAT
    'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
    'hdfs://namenode.com:port/data/mydb/mytable'
TBLPROPERTIES (
    'serialization.null.format' = '',
    'skip.header.line.count' = '1')

With this, you have total control over the separator, quote character, escape character, null handling and header handling.

Look here and here.

Upvotes: 1

Micko
Micko

Reputation: 1

Based on your sample cvs data, your regex is not matching the trailing comma, and it is also not matching the optional space character as shown in the first sample line of cvs data. Your regex should be changed from: ^(\w+)\,(\w+)\%\,(\w+)$ To: ^(\w+)\,\s*(\w+)\%\,(\w+)\,$

Upvotes: 0

maxymoo
maxymoo

Reputation: 36545

Can you do use HIVE's inbuilt regexp UDF like this:

create table temp (raw STRING);

load data local inpath '/tmp/input.csv' into table temp;

create table table1
as
select regexp_extract(line, "^(\w+)\,(\w+)\%\,(\w+)$", 1) Name,
regexp_extract(line, "^(\w+)\,(\w+)\%\,(\w+)$", 2) Perc,
regexp_extract(line, "^(\w+)\,(\w+)\%\,(\w+)$", 3) Rank
from temp;

Upvotes: 0

brandon.bell
brandon.bell

Reputation: 1411

You shouldn't need the RegexSerDe. You should be able to just set the delimiter to be a comma.

CREATE TABLE test1 (Name String, Perc String, Rank String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

You could also check out this CVS Serde. https://github.com/ogrodnek/csv-serde

Upvotes: 1

Related Questions