Reputation: 209
I am trying to load CSV file data into my Hive table,but but it has delimiter(,) , in one column's value, so Hive is taking it as a delimiter and loading it into a new column. I tried using escape sequence \ but in that I also \ (it its not working and always loading data in new column after , .
My CSV file.:
id,name,desc,per1,roll,age
226,a1,"\"double bars","item1 and item2\"",0.0,10,25
227,a2,"\"doubles","item2 & item3 item4\"",0.1,20,35
228,a3,"\"double","item3 & item4 item5\"",0.2,30,45
229,a4,"\"double","item5 & item6 item7\"",0.3,40,55
I have updated my table.:
create table testing(id int, name string, desc string, uqc double, roll int, age int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = '"',
"escapeChar" = "\\" ) STORED AS textfile;
But still I'm getting data in a different column after ,.
I'm using load data in path command.
Upvotes: 1
Views: 975
Reputation: 38290
This is how to create table based on RegexSerDe.
Each column should have corresponding capturing group ()
in the regex. You can easily debug regex without creating the table using regex_replace
:
select regexp_replace('226,a1,"\"double bars","item1 and item2\"",0.0,10,25',
'^(\\d+?),(.*?),"(.*)",([0-9.]*),([0-9]*),([0-9]*).*', --6 groups
'$1 $2 $3 $4 $5 $6'); --space delimited fields
Result:
226 a1 "double bars","item1 and item2" 0.0 10 25
If it seems good, create table:
create external table testing(id int,
name string,
desc string,
uqc double,
roll int,
age int
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ('input.regex'='^(\\d+?),(.*?),"(.*)",([0-9.]*),([0-9]*),([0-9]*).*')
location ....
TBLPROPERTIES("skip.header.line.count"="1")
;
Read this article for more details.
Upvotes: 1