Reputation: 2197
I've set up AWS WAF Logging to S3, and created an Athena table as described in the documentation ( https://docs.aws.amazon.com/athena/latest/ug/waf-logs.html )
However, http headers are stored as an array<struct<name: string, value: string>>
not a map (for valid reasons). I'd like to write a query like
select headers.user-agent, headers.if-none-match from waf_logs where something;
if-none-match may or may not be present in the headers list.
using CROSS JOIN UNNEST(httprequest.headers)
doesn't work, as this will create multiple rows. Using map notation doesn't work, as its a array of struct, not a map.
There are a lot of pages on the internet about how to set up the table, but not so many with worked examples of real-life queries, and none that I can find about how to query by nested attributes.
I'd really appreciate any suggestions. Thanks!
Upvotes: 1
Views: 4704
Reputation: 4649
I strongly recommend all developers not to use UNNEST if possible, you don't want cartesian products most of the time and you should definitely want to avoid unnesting multiple times as the number of rows will be unimaginably high. Just cast httprequest.headers
then use filter to extract value.
It is also tedious and unstable to unnest then group again just to get the value and if you need to extract multiple values from httprequest.headers
by key, you simply cannot get the 100% correct result easily and stably.
Remember, most developers think they know SQL well and mastered it, only a few realize how hard to get it right and how easy SQL returns an illusional and convincible result making you think you wrote the correct SQL and got correct result.
WITH
targets AS (
SELECT
*,
cast(
httprequest.headers as ARRAY(ROW(name VARCHAR, value VARCHAR))
) as headers
FROM your_app.waf_logs
WHERE
"action" NOT IN ('BLOCK', 'ALLOW')
AND date >= '2023/04/18'
),
result AS (
SELECT
to_iso8601(from_unixtime(timestamp / 1000)) as time_ISO_8601,
terminatingruleid,
labels,
try(
filter(
headers,
x -> LOWER(x.name) = 'user-agent'
)[1].value
) AS UserAgent,
try(
filter(
headers,
x -> LOWER(x.name) = 'x-forwarded-for'
)[1].value
) AS XForwardedForIP
FROM targets
)
SELECT
*
FROM result
;
Upvotes: 2
Reputation: 1902
Here's a WAF query that should do the trick for request headers:
WITH waf_data AS (
SELECT
waf.action as action,
waf.httprequest.clientip as clientip,
waf.httprequest.country as country,
map_agg(f.name, f.value) AS kv
FROM "waf_logs" waf,
UNNEST(waf.httprequest.headers) AS t(f)
GROUP BY 1, 2, 3
)
SELECT
waf_data.action,
waf_data.clientip,
waf_data.country,
waf_data.kv['Host'] AS host,
waf_data.kv['User-Agent'] as UserAgent,
waf_data.kv['Cookie'] as cookie
FROM waf_data
WHERE waf_data.kv['Host'] like 'waf_alb.us-east-2.elb.amazonaws.com'
LIMIT 10;
Upvotes: 2
Reputation: 45352
I used the following to extract the HTTP header values (here Host
field by name):
SELECT action, header.value as hostname, clientip, timestamp
from (
SELECT
httprequest.clientip as clientip,
action,
timestamp,
httprequest.headers as headers
FROM waf_logs
)
cross join unnest(headers) as c(header)
where lower(header.name) = 'host'
I used this post to extract the array of row which is formatted like array(row("name" varchar,"value" varchar))
in the waf logs database
Upvotes: 0