Reputation: 5678
So I'm using a standard ELK stack to analyse Apache access logs, which is working well, but I'm looking to break out URL parameters as fields, using the KV filter, in order to allow me to write better queries.
My problem is that that app I'm analysing has 'cache-busting' dynamically generated parameters, which leads to tens of thousands of 'fields', each occurring once. ElasticSearch seems have severe trouble with this and they have no value to me, so I'd like to remove them. Below is an example of the pattern
GET /page?rand123PQY=ABC&other_var=something
GET /page?rand987ZDQ=DEF&other_var=something
In the example above, the parameters I want to remove start 'rand'. Currently my logstash.conf uses grok to extract fields from the access logs, followed by kv to extract Query string parameters:
filter {
grok {
path => "/var/log/apache/access.log"
type => "apache-access"
}
kv {
field_split => "&?"
}
}
Is there a way I can filter out any fields matching the pattern rand[A-Z0-9]*=[A-Z0-9]*
? Most examples I've seen are targeting fields by exact name, which I cannot use. I did wonder about regexing the request field into a new field, running KV on that, then removing it. Would that work?
Upvotes: 4
Views: 5905
Reputation: 1367
I know this is dated and has been answered, but for anyone looking into it as of 2017. There's a plugin named prune
that allows you to trim based on difference criteria including patterns.
prune {
blacklist_names => ["[0-9]+", "unknown_fields", "tags"]
}
Upvotes: 5
Reputation: 11571
If the set of fields that you are interested in is known and well-defined you could set target
for the kv filter, move the interesting fields to the top level of the message with a mutate filter and delete the field with the nested key/value pairs. I think this is pretty much what you suggested at the end.
Alternatively you could use a ruby filter:
filter {
ruby {
code => "
event.to_hash.keys.each { |k|
if k.start_with?('rand')
event.remove(k)
end
}
"
}
}
Upvotes: 7