Loading unstructured CSV data into Hive

Question

I would like to load a CSV file that contains 250000 posts from Stack Exchange into Hive. The CSV takes the following format:

    Id  Score   ViewCount   ParentId    Body    DisplayName rnk

Every field is delimited by a "," but the field that screws everything up is Body.

Body contains the contents of the top 250000 posts on the website so there's all sort of characters in there, so there's one post per row with 250000 rows.

I've read up on Serde and Regexp but I am still getting null values in my Hive table.

    CREATE TABLE dataStore(Id string, Score string, ViewCount string,     ParentId string, Body String, DisplayName String, Rank String)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH SERDEPROPERTIES (
    "separatorChar" = ",",
    "quoteChar"     = """",
    "escapeChar"    = "\"
    )  
    STORED AS TEXTFILE;

maxymoo · Accepted Answer

I normally use ogrodnek's serde, you might have more luck with that. Also I don't think you're escaping your special character properly, I believe you need

"quoteChar"     = """,
"escapeChar"    = "\"

Loading unstructured CSV data into Hive

Answers (1)

Related Questions