Hive - Load pipe delimited data starting with pipe

Question

Let's say I want to create a simple table with 4 columns in Hive and load some pipe-delimited data starting with pipe.

CREATE table TEST_1 (
COL1  string,
COL2  string,
COL3  string,
COL4  string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|';

Raw Data:

|123|456|Dasani Bottled|5|,   
|124|455|Test Bottled |7|

Table getting date like

----------------------------------
COL1  COL2   COL3     COL4
----------------------------------
      123   456     Dasani Bottled
      123   455     Test Bottled
----------------------------------

COL1 Getting Empty and last column not loaded. I have try to load the csv file using Hadoop put command.

Help me to resolve this issue.

Koushik Roy · Accepted Answer

You can solve this by 2 ways.

Remove first column before processing the file. This is clean and preferable solution.

cut -d "|" -f 2- input_filename > output_filename

Then use this output_filename as your input to the load process.

-d "|" - this says, use pipe as a delimiter. -f 2- - this says, extract everything after first field.

add a dummy column in the beginning of the table like this

CREATE table TEST_1 (
dummy string,
COL1  string,
COL2  string,
COL3  string,
COL4  string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|';

And then proceed with the loading data. Then you can ignore this dummy column or store the data into a final table without this column or create a view on top of this to exclude this dummy column.

Hive - Load pipe delimited data starting with pipe

Answers (2)

Related Questions