azCats
azCats

Reputation: 153

Hive - Load delimited data with special character cause off position

Let's say I want to create a simple table with 4 columns in Hive and load some pipe-delimited data.

CREATE table TEST_1 (
COL1  string,
COL2  string,
COL3  string,
COL4  string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
;

Raw Data:

123|456|Dasani Bottled \| Water|789

What I expect for Col3 value is "Dasani Bottled \| Water", it has some special character "\|" in the middle thus cause Hive table column off position starting at COL3 because I create the table using "|" as the delimiter. The special character \| does have a pipe | character within it.

Is there any way to resolve the issue so Hive can load data correctly?

Thanks for any help.

Upvotes: 0

Views: 1033

Answers (1)

hlagos
hlagos

Reputation: 7947

you can add the ESCAPED BY clause to your table creation like this to allow character escaping

CREATE table TEST_1 (
COL1  string,
COL2  string,
COL3  string,
COL4  string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|' ESCAPED BY '\'
;

From the Hive documentation

Enable escaping for the delimiter characters by using the 'ESCAPED BY' clause (such as ESCAPED BY '\') Escaping is needed if you want to work with data that can contain these delimiter characters.

A custom NULL format can also be specified using the 'NULL DEFINED AS' clause (default is '\N').

Upvotes: 3

Related Questions