Jeeva Bharathi
Jeeva Bharathi

Reputation: 554

AWS Athena's conversion from Epoch to timestamp using create table populated with wrong data

I have loaded S3 with a single parquet file for testing the Athena query.

Once I uploaded the file to S3, I used S3 select query to check the data.

Sample:

Status
 Successfully returned 5 records in 460 ms
Bytes returned: 3278 B
{
"test_date":1467936000
}

I used this one parquet file to create table using the following query

CREATE EXTERNAL TABLE `test_table`(
  `test_date` timestamp)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://xxxxxxxxx/'
TBLPROPERTIES (
  'has_encrypted_data'='false')

When I queried the table, it gave me

    test_date
1   1970-01-17 23:45:36.000
2   1970-01-17 23:45:36.000
3   1970-01-17 23:45:36.000
etc.

But converting the raw integer in parquet file, i.e., 1467936000 using python gives me the correct datetime.

>datetime.datetime.fromtimestamp(1467936000)
datetime.datetime(2016, 7, 8, 5, 30)

How do I make the Athena interpret the Epoch as timestamp properly?

Upvotes: 3

Views: 1134

Answers (1)

Jeeva Bharathi
Jeeva Bharathi

Reputation: 554

Apparently Athena requires the Epoch format to be in milliseconds. I had it in seconds.

Multiplying by 1000 in the data file fixed the issue for me.

Example:

1467936000000 properly got converted to 2016-07-08 00:00:00.000

Upvotes: 6

Related Questions