Reputation: 23
Let's say I want to create a simple table with 4 columns in Hive and load some pipe-delimited data starting with pipe.
CREATE table TEST_1 (
COL1 string,
COL2 string,
COL3 string,
COL4 string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|';
Raw Data:
|123|456|Dasani Bottled|5|,
|124|455|Test Bottled |7|
Table getting date like
----------------------------------
COL1 COL2 COL3 COL4
----------------------------------
123 456 Dasani Bottled
123 455 Test Bottled
----------------------------------
COL1 Getting Empty and last column not loaded. I have try to load the csv file using Hadoop put command.
Help me to resolve this issue.
Upvotes: 2
Views: 1215
Reputation: 38290
If fixing data files is not an option, you can use RegexSerDe instead of LasySimpleSerDe (default SerDe for text files).
Define how your data looks in the regex. Each column should have corresponding capturing group () in the regex.
To test how your regex works before creating the table, use regex_replace
select regexp_replace('|123|456|Dasani Bottled|5|, ', --your row example
'^\\|(.*?)\\|(.*?)\\|(.*?)\\|(.*?)\\|.*', --4 groups are in the regex
'$1 $2 $3 $4'); --output space delimited fields
Result:
123 456 Dasani Bottled 5
If the regex works as expected, create table (not necessarily external):
create external table TEST_1 (
COL1 string,
COL2 string,
COL3 string,
COL4 string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ('input.regex'='^\\|(.*?)\\|(.*?)\\|(.*?)\\|(.*?)\\|.*')
location ....
;
Then copy files into table location or use LOAD command.
Upvotes: 2
Reputation: 7387
You can solve this by 2 ways.
cut -d "|" -f 2- input_filename > output_filename
Then use this output_filename as your input to the load process.
-d "|"
- this says, use pipe as a delimiter.
-f 2-
- this says, extract everything after first field.
CREATE table TEST_1 (
dummy string,
COL1 string,
COL2 string,
COL3 string,
COL4 string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|';
And then proceed with the loading data. Then you can ignore this dummy column or store the data into a final table without this column or create a view on top of this to exclude this dummy column.
Upvotes: 1