Create external table from csv file in AWS Athena

Question

I am trying to create an external table in AWS Athena from a csv file that is stored in my S3.

The csv file looks as follows. As you can see, the data is not enclosed in quotation marks (") and is delimited by commas (,).

ID,PERSON_ID,DATECOL,GMAT
612766604,54723367,2020-01-15,637
615921503,158634997,2020-01-25,607
610656030,90359154,2020-01-07,670

I tried the following code to create a table:

CREATE EXTERNAL TABLE my_table
    (
        ID string,
        PERSON_ID int,
        DATE_COL date,
        GMAT int
    )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION 's3://my_bucket/som_bucket/dat/'
TBLPROPERTIES
    (
        'skip.header.line.count'='1'
    )
;

I tried to preview the table with the following code:

select
    *
from
    my_table
limit 10

Which raises this error:

HIVE_BAD_DATA: Error parsing field value '2020-01-15' for field 2: For input string: "2020-01-15"

My question is: Am I passing the correct serde? And if so, how can I format the date column (DATE_COL) such that it reads and displays days in YYYY-MM-DD?

Arturo Sbr · Accepted Answer

I replaced ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' with FIELDS TERMINATED BY ',' and enclosed the column names with "`". The following code creates the table correctly:

CREATE EXTERNAL TABLE my_table
    (
        `ID` string,
        `PERSON_ID` int,
        `DATE_COL` date,
        `GMAT` int
    )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://my_bucket/som_bucket/dat/'
TBLPROPERTIES ('skip.header.line.count'='1')
;

I do not understand the concept of a serde, but I suppose I did not need one to begin with.

Create external table from csv file in AWS Athena

Answers (2)

Related Questions