Reputation: 92
I am facing the issue related to SERDE quote remove.
I have table table tracker. I have to remove double quote from all column but have to skip on column that contain the json (Product). When I load data from CSV file it also remove the quote form the json data.
CREATE EXTERNAL TABLE IF NOT EXISTS TRACKER
(
SUBSCRIBER STRING,
SERIAL STRING,
PRODUCT STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "\"",
"escapeChar" = "\\"
) STORED AS TEXTFILE
LOCATION '/user/tracker'
tblproperties ("skip.header.line.count"="1");
Sample Data in csv
"Raj","400000",{"newData":"d0","olddata":"test1"}
"Rai","400332",{"newData":"data1","olddata":"test2"}
"Ram","444000",{"newData":"New Data","olddata":"test3"}
It work fine for first 2 column SUBSCRIBER and SERIAL but for last field product it remove the quote from json too.
Upvotes: 0
Views: 752
Reputation: 44921
RegexSerDe
create external table if not exists tracker
(
subscriber string
,serial string
,product string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ('input.regex' = '"(.*?)","(.*?)",(.*)')
tblproperties ("skip.header.line.count"="1")
;
select * from tracker
;
+--------------------+----------------+---------------------------------------+
| tracker.subscriber | tracker.serial | tracker.product |
+--------------------+----------------+---------------------------------------+
| Raj | 400000 | {"newData":"d0","olddata":"test1"} |
| Rai | 400332 | {"newData":"data1","olddata":"test2"} |
+--------------------+----------------+---------------------------------------+
Upvotes: 2