Remove Field from event by pattern

Question

So I'm using a standard ELK stack to analyse Apache access logs, which is working well, but I'm looking to break out URL parameters as fields, using the KV filter, in order to allow me to write better queries.

My problem is that that app I'm analysing has 'cache-busting' dynamically generated parameters, which leads to tens of thousands of 'fields', each occurring once. ElasticSearch seems have severe trouble with this and they have no value to me, so I'd like to remove them. Below is an example of the pattern

GET /page?rand123PQY=ABC&other_var=something GET /page?rand987ZDQ=DEF&other_var=something

In the example above, the parameters I want to remove start 'rand'. Currently my logstash.conf uses grok to extract fields from the access logs, followed by kv to extract Query string parameters:

filter { grok { path => "/var/log/apache/access.log" type => "apache-access" } kv { field_split => "&?" } } Is there a way I can filter out any fields matching the pattern rand[A-Z0-9]*=[A-Z0-9]*? Most examples I've seen are targeting fields by exact name, which I cannot use. I did wonder about regexing the request field into a new field, running KV on that, then removing it. Would that work?

Magnus B&#228;ck · Accepted Answer

If the set of fields that you are interested in is known and well-defined you could set target for the kv filter, move the interesting fields to the top level of the message with a mutate filter and delete the field with the nested key/value pairs. I think this is pretty much what you suggested at the end.

Alternatively you could use a ruby filter:

filter {
  ruby {
    code => "
      event.to_hash.keys.each { |k|
        if k.start_with?('rand')
          event.remove(k)
        end
      }
    "
  }
}

Remove Field from event by pattern

Answers (2)

Related Questions