Reputation: 554
I have loaded S3 with a single parquet file for testing the Athena query.
Once I uploaded the file to S3, I used S3 select query to check the data.
Sample:
Status
Successfully returned 5 records in 460 ms
Bytes returned: 3278 B
{
"test_date":1467936000
}
I used this one parquet file to create table using the following query
CREATE EXTERNAL TABLE `test_table`(
`test_date` timestamp)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://xxxxxxxxx/'
TBLPROPERTIES (
'has_encrypted_data'='false')
When I queried the table, it gave me
test_date
1 1970-01-17 23:45:36.000
2 1970-01-17 23:45:36.000
3 1970-01-17 23:45:36.000
etc.
But converting the raw integer in parquet file, i.e., 1467936000 using python gives me the correct datetime.
>datetime.datetime.fromtimestamp(1467936000)
datetime.datetime(2016, 7, 8, 5, 30)
How do I make the Athena interpret the Epoch as timestamp properly?
Upvotes: 3
Views: 1134
Reputation: 554
Apparently Athena requires the Epoch format to be in milliseconds
. I had it in seconds
.
Multiplying by 1000
in the data file fixed the issue for me.
Example:
1467936000000
properly got converted to 2016-07-08 00:00:00.000
Upvotes: 6