Reputation: 11
How do I avoid elasticsearch duplicate documents?
The elasticsearch index docs count (20,010,253) doesn’t match with logs line count (13,411,790).
File input plugin.
File rotation is detected and handled by this input,
regardless of whether the file is rotated via a rename or a copy operation.
nifi:
real time nifi pipeline copies logs from nifi server to elk server.
nifi has rolling log files.
logs line count on elk server:
wc -l /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log
13,411,790 total
elasticsearch index docs count:
curl -XGET 'ip:9200/_cat/indices?v&pretty'
docs.count = 20,010,253
logstash input conf file:
cat /mnt/elk/logstash/input_conf_files/test_4.conf
input {
file {
path => "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log"
type => "test_4"
sincedb_path => "/mnt/elk/logstash/scripts/sincedb/test_4"
}
}
filter {
if [type] == "test_4" {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time} %{WORD:EventType} %{GREEDYDATA:EventText}"
}
}
}
}
output {
if [type] == "test_4" {
elasticsearch {
hosts => "ip:9200"
index => "test_4"
}
}
else {
stdout {
codec => rubydebug
}
}
}
Upvotes: 0
Views: 754
Reputation: 629
You can use fingerprint filter plugin: https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html
This can e.g. be used to create consistent document ids when inserting events into Elasticsearch, allowing events in Logstash to cause existing documents to be updated rather than new documents to be created.
Upvotes: 1