D-2020365
D-2020365

Reputation: 92

How to skip a column from a table while removing the quote from the column in HIVE using SerDe

I am facing the issue related to SERDE quote remove.

I have table table tracker. I have to remove double quote from all column but have to skip on column that contain the json (Product). When I load data from CSV file it also remove the quote form the json data.

CREATE EXTERNAL TABLE IF NOT EXISTS TRACKER
(
SUBSCRIBER STRING,
SERIAL STRING,
PRODUCT STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "quoteChar"     = "\"",
   "escapeChar"    = "\\"
)  STORED AS TEXTFILE
LOCATION '/user/tracker'
tblproperties ("skip.header.line.count"="1");

Sample Data in csv

"Raj","400000",{"newData":"d0","olddata":"test1"}
"Rai","400332",{"newData":"data1","olddata":"test2"}
"Ram","444000",{"newData":"New Data","olddata":"test3"}

It work fine for first 2 column SUBSCRIBER and SERIAL but for last field product it remove the quote from json too.

Upvotes: 0

Views: 752

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44921

RegexSerDe

create external table if not exists tracker
(
    subscriber  string
   ,serial      string
   ,product     string
)
    row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
    with serdeproperties ('input.regex' = '"(.*?)","(.*?)",(.*)')
    tblproperties ("skip.header.line.count"="1")
;

select * from tracker
;

+--------------------+----------------+---------------------------------------+
| tracker.subscriber | tracker.serial |            tracker.product            |
+--------------------+----------------+---------------------------------------+
| Raj                |         400000 | {"newData":"d0","olddata":"test1"}    |
| Rai                |         400332 | {"newData":"data1","olddata":"test2"} |
+--------------------+----------------+---------------------------------------+

Upvotes: 2

Related Questions