sedavidw
sedavidw

Reputation: 11741

Logstash unable to index into elasticsearch because it can't parse date

I am getting a lot of the following errors when I am running logstash to index documents into Elasticsearch

[2019-11-02T18:48:13,812][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"my-index-2019-09-28", :_type=>"doc", :_routing=>nil}, #<LogStash::Event:0x729fc561>], :response=>{"index"=>{"_index"=>"my-index-2019-09-28", "_type"=>"doc", "_id"=>"BhlNLm4Ba4O_5bsE_PxF", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [timestamp] of type [date] in document with id 'BhlNLm4Ba4O_5bsE_PxF'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2019-09-28 23:32:10.586\" is malformed at \" 23:32:10.586\""}}}}}

It clearly has a problem with the date being formed but I don't see what that problem could be. Below are excerpts from my logstash config and the elasticsearch template. I include these because I'm trying to use the timestamp field to articulate the index in my logstash config by copying timestamp into @timestamp then formatting that to a YYY-MM-DD format and use that stored metadata to articulate my index

Logstash config:

input {
      stdin { type => stdin }
}
filter {
  csv {
     separator => " "   # this is a tab (/t) not just whitespace
     columns => ["timestamp","field1", "field2", ...]
     convert => {
       "timestamp" => "date_time"
       ...
     }
  }
}

filter {
  date {
    match => ["timestamp", "yyyy-MM-dd' 'HH:mm:ss'.'SSS'"]
    target => "@timestamp"
  }
}

filter {
  date_formatter {
    source => "@timestamp"
    target => "[@metadata][date]"
    pattern => "YYYY-MM-dd"
  }
}


filter {
  mutate {
    remove_field => [
      "@timestamp",
      ...
    ]
  }
}

output {
   amazon_es {
     hosts =>
         ["my-es-cluster.us-east-1.es.amazonaws.com"]
     index => "my-index-%{[@metadata][date]}"
     template => "my-config.json"
     template_name => "my-index-*"
     region => "us-east-1"
  }
}

Template:

{
    "template" : "my-index-*",
    "mappings" : {
      "doc" : {
        "dynamic" : "false",
        "properties" : {

          "timestamp" : {
            "type" : "date"
          }, ...
    },
    "settings" : {
      "index" : {
        "number_of_shards" : "12",
        "number_of_replicas" : "0"
      }
    }
}

When I inspect the raw data it looks like what the error is showing me and that appears to be well formed so I'm not sure what my issue is

Here is an example row, it's been redacted but the problem field is untouched and is the first one

2019-09-28 07:29:46.454 NA  2019-09-28 07:29:00 someApp 62847957802 62847957802

Upvotes: 1

Views: 2486

Answers (2)

sedavidw
sedavidw

Reputation: 11741

Turns out the source problem was the convert block. logstash is unable to understand the time format specified in the file. To address this I changed the original timestamp field to unformatted_timestamp and apply the date formatter I was already using

filter {
  date {
    match => ["unformatted_timestamp", "yyyy-MM-dd' 'HH:mm:ss'.'SSS'"]
    target => "timestamp"
  }
}

filter {
  date_formatter {
    source => "timestamp"
    target => "[@metadata][date]"
    pattern => "YYYY-MM-dd"
  }
}

Upvotes: 1

leandrojmp
leandrojmp

Reputation: 7473

You are parsing your lines using the csv filter and setting the separator to a space, but your date is also split by a space, this way your first field, named timestamp only gets the date 2019-09-28 and the time is on the field named field1.

You can solve your problem creating a new field named date_and_time with the contents of the fields with the date and the time, for example.

csv {
    separator => " "
    columns => ["date","time","field1","field2","field3","field4","field5","field6"]
}
mutate {
    add_field => { "date_and_time" => "%{date} %{time}" }
}
mutate {
    remove_field => ["date","time"]
}

This will create a field named date_and_time with the value 2019-09-28 07:29:46.454, you can now use the date filter to parse this value into the @timestamp field, the default for logstash.

date {
    match => ["date_and_time", "YYYY-MM-dd HH:mm:ss.SSS"]
}

This will leave you with two fields with the same value, date_and_time and @timestamp, the @timestamp is the default for logstash so I would suggest keeping it and removing the date_and_time that was created before.

mutate {
    remove_field => ["date_and_time"]
}

Now you can create your date based index using the format YYYY-MM-dd and logstash will extract the date from the @timestamp field, just change your index line in your output for this one:

index => "my-index-%{+YYYY-MM-dd}"

Upvotes: 0

Related Questions