Reputation: 1075
I'm following the example AWS documentation gave for creating a CloudFront log table in Athena.
CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
`date` DATE,
time STRING,
location STRING,
bytes BIGINT,
requestip STRING,
method STRING,
host STRING,
uri STRING,
status INT,
referrer STRING,
useragent STRING,
querystring STRING,
cookie STRING,
resulttype STRING,
requestid STRING,
hostheader STRING,
requestprotocol STRING,
requestbytes BIGINT,
timetaken FLOAT,
xforwardedfor STRING,
sslprotocol STRING,
sslcipher STRING,
responseresulttype STRING,
httpversion STRING,
filestatus STRING,
encryptedfields INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://your_log_bucket/prefix/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )
Creating the table with the time
field as a string
doesn't allow me to run conditional queries. I tried re-creating the table with the following:
CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
`date` DATE,
time timestamp,
....
Unfortunately this did not work and I received no results in the time
field when I previewed the table.
Does anyone have any experience casting the time
to something that I can use to query?
Upvotes: 2
Views: 2956
Reputation: 1305
It's frustrating that there isn't a straightforward way to have usable timestamps (dates with times included) in a table based on CloudFront logs.
However, this is now my workaround:
I create a view based on the original table. Say my original table is cloudfront_prod_logs
. I create a view, cloudfront_prod_logs_w_datetime
that has a proper datetime
/timestamp field and I use that in queries, instead of the original table.
CREATE OR REPLACE VIEW cloudfront_prod_logs_w_datetime AS
SELECT
"date_parse"("concat"(CAST(date AS varchar), ' ', CAST(time AS varchar)), '%Y-%m-%d %H:%i:%s') datetime
, *
FROM
cloudfront_prod_logs
Upvotes: 1
Reputation: 1148
Concat the date and time into a timestamp in a subquery:
WITH ds AS
(SELECT *,
parse_datetime( concat( concat( format_datetime(date,
'yyyy-MM-dd'), '-' ), time ),'yyyy-MM-dd-HH:mm:ss') AS datetime
FROM default.cloudfront_www
WHERE requestip = '207.30.46.111')
SELECT *
FROM ds
WHERE datetime
BETWEEN timestamp '2018-11-19 06:00:00'
AND timestamp '2018-11-19 12:00:00'
Upvotes: 4