Craig
Craig

Reputation: 1985

Hive external table delimited by commas, but comma present in data

I have some data coming in from an external source of the format:

user_id, user_name, project_name, position
"111", "Tom Petty", "Heartbreakers", "Vocals"
"222", "Ringo Starr", "Beatles, The", "Drummer"
"333", "Tom Brady", "Patriots", "QB"

And I create my external table thusly:

CREATE EXTERNAL TABLE tab1 (
USER_ID String,
USER_NAME String,
PROJECT_NAME String,
POSITION String
) 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/blah/foo'

The problem occurs when data in some of the columns have embedded commas in them, Beatles, The for instance. This results in Hive putting the word The into the next column (position) and dropping the data in the last column.

All the incoming data fields are wrapped in double quotes but they are comma delimited even though they may have commas in them. Unfortunately having the sender clean the data is not an option.

How can I go about creating this table?

Upvotes: 0

Views: 2015

Answers (2)

Sagar Morakhia
Sagar Morakhia

Reputation: 797

try this

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "\""
)  

Upvotes: 1

Rishu S
Rishu S

Reputation: 3968

You can try using Open CSV Serde in your hive table creation using specific serDe properties.

https://cwiki.apache.org/confluence/display/Hive/CSV+Serde

Upvotes: 1

Related Questions