whobutsb
whobutsb

Reputation: 1075

AWS Athena - Cast CloudFront log time field to timestamp

I'm following the example AWS documentation gave for creating a CloudFront log table in Athena.

CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
  `date` DATE,
  time STRING,
  location STRING,
  bytes BIGINT,
  requestip STRING,
  method STRING,
  host STRING,
  uri STRING,
  status INT,
  referrer STRING,
  useragent STRING,
  querystring STRING,
  cookie STRING,
  resulttype STRING,
  requestid STRING,
  hostheader STRING,
  requestprotocol STRING,
  requestbytes BIGINT,
  timetaken FLOAT,
  xforwardedfor STRING,
  sslprotocol STRING,
  sslcipher STRING,
  responseresulttype STRING,
  httpversion STRING,
  filestatus STRING,
  encryptedfields INT
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
LOCATION 's3://your_log_bucket/prefix/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )

Creating the table with the time field as a string doesn't allow me to run conditional queries. I tried re-creating the table with the following:

CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
  `date` DATE,
  time timestamp,
....

Unfortunately this did not work and I received no results in the time field when I previewed the table.

Does anyone have any experience casting the time to something that I can use to query?

Upvotes: 2

Views: 2956

Answers (2)

Jamie Jackson
Jamie Jackson

Reputation: 1305

It's frustrating that there isn't a straightforward way to have usable timestamps (dates with times included) in a table based on CloudFront logs.

However, this is now my workaround:

I create a view based on the original table. Say my original table is cloudfront_prod_logs. I create a view, cloudfront_prod_logs_w_datetime that has a proper datetime/timestamp field and I use that in queries, instead of the original table.

CREATE OR REPLACE VIEW cloudfront_prod_logs_w_datetime AS 
SELECT
  "date_parse"("concat"(CAST(date AS varchar), ' ', CAST(time AS varchar)), '%Y-%m-%d %H:%i:%s') datetime
, *
FROM
  cloudfront_prod_logs

Upvotes: 1

Derrick Petzold
Derrick Petzold

Reputation: 1148

Concat the date and time into a timestamp in a subquery:

WITH ds AS 
    (SELECT *,
         parse_datetime( concat( concat( format_datetime(date,
         'yyyy-MM-dd'), '-' ), time ),'yyyy-MM-dd-HH:mm:ss') AS datetime
    FROM default.cloudfront_www
    WHERE requestip = '207.30.46.111')
SELECT *
FROM ds
WHERE datetime
    BETWEEN timestamp '2018-11-19 06:00:00'
        AND timestamp '2018-11-19 12:00:00'

Upvotes: 4

Related Questions