TheGoat
TheGoat

Reputation: 2877

Loading unstructured CSV data into Hive

I would like to load a CSV file that contains 250000 posts from Stack Exchange into Hive. The CSV takes the following format:

    Id  Score   ViewCount   ParentId    Body    DisplayName rnk

Every field is delimited by a "," but the field that screws everything up is Body.

Body contains the contents of the top 250000 posts on the website so there's all sort of characters in there, so there's one post per row with 250000 rows.

I've read up on Serde and Regexp but I am still getting null values in my Hive table.

    CREATE TABLE dataStore(Id string, Score string, ViewCount string,     ParentId string, Body String, DisplayName String, Rank String)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH SERDEPROPERTIES (
    "separatorChar" = ",",
    "quoteChar"     = """",
    "escapeChar"    = "\"
    )  
    STORED AS TEXTFILE;

Upvotes: 1

Views: 418

Answers (1)

maxymoo
maxymoo

Reputation: 36555

I normally use ogrodnek's serde, you might have more luck with that. Also I don't think you're escaping your special character properly, I believe you need

"quoteChar"     = "\"",
"escapeChar"    = "\\"

Upvotes: 1

Related Questions