Data_101
Data_101

Reputation: 953

Convert Time Stamp while Creating Table in Amazon Athena

I have been using the below query to create a table within Athena,

CREATE EXTERNAL TABLE IF NOT EXISTS test.test_table (
      `converteddate` string,
      `userid` string,
    )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ','
) LOCATION 's3:XXXX'
TBLPROPERTIES ('has_encrypted_data'='false',"skip.header.line.count"="1")

This returns me:

converteddate        | userid
-------------------------------------
2017-11-29T05:00:00  | 00001
2017-11-27T04:00:00  | 00002
2017-11-26T03:00:00  | 00003
2017-11-25T02:00:00  | 00004
2017-11-24T01:00:00  | 00005

I would like to return:

converteddate        | userid
-------------------------------------
2017-11-29 05:00:00  | 00001
2017-11-27 04:00:00  | 00002
2017-11-26 03:00:00  | 00003
2017-11-25 02:00:00  | 00004
2017-11-24 01:00:00  | 00005

and have converteddate as a datetime and not a string.

Upvotes: 3

Views: 2810

Answers (2)

Alexandre
Alexandre

Reputation: 561

(My answer has one premise: you are using OpenCSVSerDe. It doesn't apply to LazySimpleSerDe, for instance.)

If you have the option of changing the format of your input CSV file, you should convert your timestamp to UNIX Epoch Time. That's the format that OpenCSVSerDe is expecting.

For instance, your sample CSV looks like this:

"converteddate","userid"    
"2017-11-29T05:00:00","00001"
"2017-11-27T04:00:00","00002"
"2017-11-26T03:00:00","00003"
"2017-11-25T02:00:00","00004"
"2017-11-24T01:00:00","00005"

It should be:

"converteddate","userid"
"1511931600000","00001"
"1511755200000","00002"
"1511665200000","00003"
"1511575200000","00004"
"1511485200000","00005"

Those integers are the number of milliseconds since Midnight January 1, 1970 for each one of your original dates.

Then you can run a slightly modified version of your CREATE TABLE statement:

CREATE EXTERNAL TABLE IF NOT EXISTS test.test_table (
      converteddate timestamp,
      userid string
    )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
LOCATION 's3:XXXX'
TBLPROPERTIES ("skip.header.line.count"="1");

If you query your Athena table with select * from test_table, this will be the result:

       converteddate        userid  
 ------------------------- -------- 
  2017-11-29 05:00:00.000    00001  
  2017-11-27 04:00:00.000    00002  
  2017-11-26 03:00:00.000    00003  
  2017-11-25 02:00:00.000    00004  
  2017-11-24 01:00:00.000    00005  

As you can see, type TIMESTAMP on Athena includes milliseconds.

I wrote a more comprehensive explanation on using types TIMESTAMP and DATE with OpenCSVSerDe. You can read it here.

Upvotes: 1

Dhaval
Dhaval

Reputation: 1076

It is not possible to convert the data while table creation. But you can get the data while querying. You can use date_parse(string,format) -> timestamp function. More details are mentioned here.

For your usecase you can do something like as follows

select date_parse(converteddate, '%y-%m-%dT%H:%i:%s') as converted_timestamp, userid
from test_table

Note : Based on type of your string you have to choose proper specifier for month(always two digits or not), day, hour(12 or 24 hours format), etc

Upvotes: 4

Related Questions