Reputation: 1
I am trying to store the following data in a csv file into Hive table but not able to do it successfully
Ann, 78%,7,
Beth,81%,5,
Cathy,83%,2,
The data is present in CSV file. I created the table in Hive using below definition:
Hive> CREATE TABLE test1 (Name String, Perc String, Rank String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "^(\w+)\,(\w+)\%\,(\w+)$",
"output.format.string" = "%1$s %2$s %3$s" )
STORED AS TEXTFILE;"
ok
hive> load data local inpath '/tmp/input.csv' into table test1;
ok
hive> Select * from test1;
ok
Name Perc Rank
Null Null Null
Null Null Null
Null Null Null
I am not able to figure out the mistake. The resulting data is not getting loaded into the table.
Upvotes: 0
Views: 3538
Reputation: 9549
Use the OpenCSVSerde
if you need flexibility.
CREATE EXTERNAL TABLE `mydb`.`mytable`(
`product_name` string,
`brand_id` string,
`brand` string,
`color` string,
`description` string,
`sale_price` string)
PARTITIONED BY (
`seller_id` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = '\t',
'quoteChar' = '"',
'escapeChar' = '\\')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://namenode.com:port/data/mydb/mytable'
TBLPROPERTIES (
'serialization.null.format' = '',
'skip.header.line.count' = '1')
With this, you have total control over the separator, quote character, escape character, null handling and header handling.
Upvotes: 1
Reputation: 1
Based on your sample cvs data, your regex is not matching the trailing comma, and it is also not matching the optional space character as shown in the first sample line of cvs data. Your regex should be changed from: ^(\w+)\,(\w+)\%\,(\w+)$ To: ^(\w+)\,\s*(\w+)\%\,(\w+)\,$
Upvotes: 0
Reputation: 36545
Can you do use HIVE's inbuilt regexp UDF like this:
create table temp (raw STRING);
load data local inpath '/tmp/input.csv' into table temp;
create table table1
as
select regexp_extract(line, "^(\w+)\,(\w+)\%\,(\w+)$", 1) Name,
regexp_extract(line, "^(\w+)\,(\w+)\%\,(\w+)$", 2) Perc,
regexp_extract(line, "^(\w+)\,(\w+)\%\,(\w+)$", 3) Rank
from temp;
Upvotes: 0
Reputation: 1411
You shouldn't need the RegexSerDe. You should be able to just set the delimiter to be a comma.
CREATE TABLE test1 (Name String, Perc String, Rank String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
You could also check out this CVS Serde. https://github.com/ogrodnek/csv-serde
Upvotes: 1