nixmind
nixmind

Reputation: 2266

Logstash keeps doing s3 input task but nerver send ouput events

I have issue with my logstash s3 iinput. The last messages I see in my kibana iinterface is from several days earlier In fact I have an AWS elb with logs enable. I've tested from command line and I can see that logsstash is continuously processing inputs, and never outputs. In the elb s3 bucket there is one folder per day/per month/per year and each folder contains several log files and with a total size of arround 60GB.

It was working fine at the begining, but as logs grow, it become slow, and now I'm seeing my logs in the outpiut size. Logstah is keeping doing input task, filter, and never output logs.

I created a dedicated configuration file for test with only s3 as input, and test in a dedicated machine from command line :

/opt/logstash/bin/logstash agent -f /tmp/s3.conf  --debug  2>&1 | tee  /tmp/logstash.log
```
the s3.conf file :
```
admin@ip-10-3-27-129:~$ cat /tmp/s3.conf 
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# !!!!!!!!!  This file is managed by SALT  !!!!!!!!!
# !!!!!!!!!    All changes will be lost    !!!!!!!!!
# !!!!!!!!!     DO NOT EDIT MANUALLY !     !!!!!!!!!
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

#--[ INPUT ]----------------------------------------------------------------
input
{

  # Logs ELB API
  s3
  {
    bucket => "s3.prod.elb.logs.eu-west-1.mydomain"
    prefix => "rtb/smaato/AWSLogs/653589716289/elasticloadbalancing/"
    interval => 30
    region => "eu-west-1"
    type => "elb_access_log"
  }

}



#--[ FILTER ]---------------------------------------------------------------
filter
{

    # Set the HTTP request time to @timestamp field
    date {
      match => [ "timestamp", "ISO8601" ]
      remove_field => [ "timestamp" ]
    }


  # Parse the ELB access logs
  if [type] == "elb_access_log" {
    grok {
      match => [ "message", "%{TIMESTAMP_ISO8601:timestamp:date} %{HOSTNAME:loadbalancer} %{IP:client_ip}:%{POSINT:client_port:int} (?:%{IP:backend_ip}:%{POSINT:backend_port:int}|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} %{INT:backend_status_code:int} %{INT:received_bytes:int} %{INT:sent_bytes:int} %{INT:sent_bytes_ack:int} \"%{WORD:http_method} %{URI:url_asked} HTTP/%{NUMBER:http_version}\" \"%{GREEDYDATA:user_agent}\" %{NOTSPACE:ssl_cipher} %{NOTSPACE:ssl_protocol}" ]
      remove_field => [ "message" ]
    }

    kv {
      field_split => "&?"
      source => "url_asked"
    }

    date {
      match => [ "timestamp", "ISO8601" ]
      remove_field => [ "timestamp" ]
    }
  }

  # Remove the filebeat input tag
  mutate {
    remove_tag => [ "beats_input_codec_plain_applied" ]
  }

  # Remove field tags if empty
  if [tags] == [] {
    mutate {
      remove_field => [ "tags" ]
    }
  }

  # Remove some unnecessary fields to make Kibana cleaner
  mutate {
    remove_field => [ "@version", "count", "fields", "input_type", "offset", "[beat][hostname]", "[beat][name]", "[beat]" ]
  }

}

#--[ OUTPUT ]---------------------------------------------------------------
output
#{
#  elasticsearch {
#    hosts => ["10.3.16.75:9200"]
#  }
#}
{
#   file {
 #     path => "/tmp/logastash/elb/elb_logs.json"
 #  }
 stdout { codec => rubydebug }
}

And I can see input processing, filter, and the message like "will start output worker....." but not output event received, never.

I created a new folder (named test_elb) on the bucket, and copy logs from one day folder (31/12/2016 for example) into it, and then set the new created as prefix in my input configuration like this :

 s3
  {
    bucket => "s3.prod.elb.logs.eu-west-1.mydomain"
    prefix => "rtb/smaato/AWSLogs/653589716289/test_elb/"
    interval => 30
    region => "eu-west-1"
    type => "elb_access_log"
  }

And with that s3 prefix, logstash is doing all the pipeline processing (input, filter, output) as expecting, and I see my logs outputs. so for me it seems like the bucket is to large and losgstash-s3 plugin has difficult to process it. Can someone here advise on that problematic please?

My logstash version : 2.2.4 Operating system: Debian Jessie

I've search and ask in the discuss.elastic forum, in the elasticseach IRC chan, and no real solution. Do you thing it could be a bucket size matter

Thanks for the help.

Regards.

Upvotes: 3

Views: 3757

Answers (2)

Steven Ensslen
Steven Ensslen

Reputation: 1376

This behaviour is controlled by the watch_for_new_files parameter. In the default, true, setting logstash will not process the existing files, and will wait for new files to arrive.

Example:

input {
 s3
  {
    bucket => "the-bucket-name"
    prefix => "the_path/ends_with_the_slash/"
    interval => 30
    region => "eu-west-1"
    type => "elb_access_log"
    watch_for_new_files => false
  }
}
output {
  stdout{}
}

Upvotes: 1

Timothy Gonzalez
Timothy Gonzalez

Reputation: 1908

  1. Configure the s3 input plugin to move files to a bucket/path not considered in the input once processed.
  2. While there are many files in the input bucket/path you may need to run logstash on a subset of the data until it moves the files to the processing bucket/path.

This is what I'm doing to process about .5GiB (several hundred thousand) of files per day. Logstash will pull all of the object names prior to doing any inserts so it will appear that the process is stuck if you have a huge amount of files in your bucket.

    bucket => "BUCKET_NAME"
    prefix => "logs/2017/09/01"
    backup_add_prefix => "sent-to-logstash-"
    backup_to_bucket => "BUCKET_NAME"
    interval => 120
    delete => true

I'm not certain how durable the process is against data loss between the bucket moves, but for logs which aren't mission critical, this process is highly efficient considering the amount of files being moved.

Upvotes: 1

Related Questions