databeata
databeata

Reputation: 11

How to avoid elasticsearch duplicate documents

How do I avoid elasticsearch duplicate documents?

The elasticsearch index docs count (20,010,253) doesn’t match with logs line count (13,411,790).

documentation:

File input plugin. 
File rotation is detected and handled by this input, 
regardless of whether the file is rotated via a rename or a copy operation.

nifi:

real time nifi pipeline copies logs from nifi server to elk server. 
nifi has rolling log files.

logs line count on elk server:

wc -l /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log
13,411,790 total 

elasticsearch index docs count:

curl -XGET 'ip:9200/_cat/indices?v&pretty'
docs.count = 20,010,253 

logstash input conf file:

cat /mnt/elk/logstash/input_conf_files/test_4.conf
input {
file {
path => "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log"
type => "test_4"
sincedb_path => "/mnt/elk/logstash/scripts/sincedb/test_4"
}
}
filter {
if [type] == "test_4" {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time} %{WORD:EventType} %{GREEDYDATA:EventText}"
}
}
}
}
output {
if [type] == "test_4" {
elasticsearch {
hosts => "ip:9200"
index => "test_4"
}
}
else {
stdout {
codec => rubydebug
}
}
}

Upvotes: 0

Views: 754

Answers (1)

Mario Souza
Mario Souza

Reputation: 629

You can use fingerprint filter plugin: https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html

This can e.g. be used to create consistent document ids when inserting events into Elasticsearch, allowing events in Logstash to cause existing documents to be updated rather than new documents to be created.

Upvotes: 1

Related Questions