Reputation: 1985
I have some data coming in from an external source of the format:
user_id, user_name, project_name, position
"111", "Tom Petty", "Heartbreakers", "Vocals"
"222", "Ringo Starr", "Beatles, The", "Drummer"
"333", "Tom Brady", "Patriots", "QB"
And I create my external table thusly:
CREATE EXTERNAL TABLE tab1 (
USER_ID String,
USER_NAME String,
PROJECT_NAME String,
POSITION String
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/blah/foo'
The problem occurs when data in some of the columns have embedded commas in them, Beatles, The
for instance. This results in Hive putting the word The
into the next column (position) and dropping the data in the last column.
All the incoming data fields are wrapped in double quotes but they are comma delimited even though they may have commas in them. Unfortunately having the sender clean the data is not an option.
How can I go about creating this table?
Upvotes: 0
Views: 2015
Reputation: 797
try this
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "\""
)
Upvotes: 1
Reputation: 3968
You can try using Open CSV Serde in your hive table creation using specific serDe properties.
https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
Upvotes: 1