Pavel
Pavel

Reputation: 27

Bytes form nginx logs is mapped as string not number in elasticsearch

recently I deployed ELK and started forwarding logs from nginx through logstash frowarder.

Problem is, that in elasticsearch (1.4.2) / kibana (4) is "bytes" value of request mapped as string.

I uses standard congfiguration found everywhere.

Into logstash patterns added new pattern for nginx logs:

NGUSERNAME [a-zA-Z\.\@\-\+_%]+                                                                                                                                                                                                                      
NGUSER %{NGUSERNAME}                                                                                                                                                                                                                                
NGINXACCESS %{IPORHOST:http_host} %{IPORHOST:clientip} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{NUMBER:request_time:float} %{NUMBER:upstream_time:float}
NGINXACCESS %{IPORHOST:http_host} %{IPORHOST:clientip} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{NUMBER:request_time:float}

Added these configs for logstash

input {
  lumberjack {
    port => 5000
    type => "logs"
    ssl_certificate => "/etc/logstash/tls/certs/logstash-forwarder.crt"
    ssl_key => "/etc/logstash/tls/private/logstash-forwarder.key"
  }
}
filter {
    if [type] == "syslog" {
        grok {
            match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
            add_field => [ "received_at", "%{@timestamp}" ]
            add_field => [ "received_from", "%{host}" ]
        }
        syslog_pri { }
        date {
            match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
        }
    } else if [type] == "nginx" {
        grok {
            match => { "message" => "%{NGINXACCESS}" }
        }
        date {
            match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
        }
        geoip {
            source => "clientip"
        }
    }
}
output {
  elasticsearch_http {
    host => localhost
  }
}

But in elsticsearch I see that as string even if I define "bytes" as long

(?:%{NUMBER:bytes:long}|-)

Does anybody know how to store "bytes" as number type?

Thanks

Upvotes: 1

Views: 783

Answers (1)

Magnus Bäck
Magnus Bäck

Reputation: 11571

You're on the right track with (?:%{NUMBER:bytes:long}|-), but "long" isn't a valid data type. Quoting the grok documentation (emphasis mine):

Optionally you can add a data type conversion to your grok pattern. By default all semantics are saved as strings. If you wish to convert a semantic’s data type, for example change a string to an integer then suffix it with the target data type. For example %{NUMBER:num:int} which converts the num semantic from a string to an integer. Currently the only supported conversions are int and float.

Note that this doesn't control the data type that's actually used in the indexing on the Elasticsearch side, only the data type of the JSON document that's sent to Elasticsearch (which may or may not affect which mapping ES uses). In the JSON context there's no difference between ints and longs; scalar values are either numbers, bools, or strings.

Upvotes: 1

Related Questions