tomsoft
tomsoft

Reputation: 4567

Logstash rsyslog + apache

I would like to use rsyslog to retrieve apache log and process them using Logstash

Log are well received in rsyslog, and then in logstash, but I would like to extract the content of the apache logfile from the message part of rsyslog.

For instance, here is the line received in logstash. The last part is the apache log.

2015-09-20T16:27:30.000Z 1.1.20.133 <173>Sep 20 16:27:30 ip-12-1-8-7 apache[26914]: 10.25.52.66 - - [20/Sep/2015:16:27:30 +0000] "GET / HTTP/1.1" 200 - "-" "Dalvik/1.6.0 (Linux; U; Android 4.2.2; MID Build/JDQ39)" "-"

I would like to extract the apache part and then parse it again.

10.25.52.66 - - [20/Sep/2015:16:27:30 +0000] "GET / HTTP/1.1" 200 - "-" "Dalvik/1.6.0 (Linux; U; Android 4.2.2; MID Build/JDQ39)" "-"

How to do this using grok I guess. Is it possible to do a first filter using grok to identify syslog, extract syslog message, and then parse it as an apache log.

The filter used to extract the rsyslog is the following:

filter {
  grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
   }
}

Now, how can I use syslog_message to extract apache data. Do I need to do a single grok match command , or can I do this in two step: extract syslog data, and filter apache lines using grok/

The followings works, but I was wondering is there is something better to avoid duplication:

filter {
 if [type] == "syslog" {
   grok {
     match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
     add_field => [ "received_at", "%{@timestamp}" ]
     add_field => [ "received_from", "%{host}" ]
   }
   grok {
     match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}  ${COMBINEDAPACHELOG}" }
   }
  }
}

Upvotes: 0

Views: 839

Answers (1)

Alain Collins
Alain Collins

Reputation: 16362

You're very close!

In the second grok, you should use the syslog_message field as your input, and only the COMBINEDAPACHELOG as your pattern.

That's a good way to post-process a field with grok to extract more information from it, as you have done.

Since the log file will only ever have one format it in, you can also combine the two groks into one:

 match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{COMBINEDAPACHELOG}" }

Upvotes: 1

Related Questions