Reputation: 2877
I would like to load a CSV file that contains 250000 posts from Stack Exchange into Hive. The CSV takes the following format:
Id Score ViewCount ParentId Body DisplayName rnk
Every field is delimited by a "," but the field that screws everything up is Body.
Body contains the contents of the top 250000 posts on the website so there's all sort of characters in there, so there's one post per row with 250000 rows.
I've read up on Serde and Regexp but I am still getting null values in my Hive table.
CREATE TABLE dataStore(Id string, Score string, ViewCount string, ParentId string, Body String, DisplayName String, Rank String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = """",
"escapeChar" = "\"
)
STORED AS TEXTFILE;
Upvotes: 1
Views: 418
Reputation: 36555
I normally use ogrodnek's serde, you might have more luck with that. Also I don't think you're escaping your special character properly, I believe you need
"quoteChar" = "\"",
"escapeChar" = "\\"
Upvotes: 1