Reputation: 2491
I have added a table in Athena for querying application load balancer logs. I have created the table using below query and added partition after that according to the data stores in s3. But still I am not able to get the required data with query.
Table Creation Query:
CREATE EXTERNAL TABLE IF NOT EXISTS {{DATABASE_NAME.TABLE_NAME}} (
type string,
time string,
elb string,
client_ip string,
client_port string,
target string,
request_processing_time int,
target_processing_time int,
response_processing_time int,
elb_status_code int,
target_status_code string,
received_bytes int,
sent_bytes int,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string
)
PARTITIONED BY(year string, month string, day string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([^ ]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) ([^ ]*)\" \"([^\"]*)\" ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*)'
) LOCATION 's3://{{BUCKET_NAME}}/AWSLogs/{{ACCOUNT_ID}}/elasticloadbalancing/us-west-2/';
Partition Query:
ALTER TABLE alb_webapp add partition (year="2018", month="*", day="*")
location "s3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/2018/09/";
When I am trying to run a simple query say "select" it is giving me the zero results found.
I want the partition to be made on the basis of year or month.
Upvotes: 3
Views: 2569
Reputation: 2491
It seems AWS have again changed their format of alb logs. AWS documents have new regex to work with new log format. Below query worked for me.
Table creation:
CREATE EXTERNAL TABLE IF NOT EXISTS webapp_alb (
type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code string,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string,
domain_name string,
chosen_cert_arn string,
matched_rule_priority string,
request_creation_time string,
actions_executed string,
redirect_url string,
new_field string
)
PARTITIONED BY(year string, month string, day string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' =
'([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\"($| \"[^ ]*\")(.*)')
LOCATION 's3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/';
Adding Partitions:
ALTER TABLE webapp_alb add partition (year="2018", month="*", day="*") location "s3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/2018/09/";
Ref:
https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html https://docs.aws.amazon.com/athena/latest/ug/partitions.html
Upvotes: 5