James Jiang
James Jiang

Reputation: 2193

Elasticsearch daily rolling index contains duplicate _id

this maybe a silly question but I am using the daily rolling index to save my events with logstash, the config is simple as:

input: {..source..}
filter: {..filter..}
output: {
 elasticsearch: {
   document_id: %{my_own_guarantee_unique_id}
   index: myindex-%{+YYYY.MM.DD}
 }
}

what I found was if there are events with same my_own_guarantee_unique_id appears on different days, it will be created multiple times in these daily rolling indexes, ie. you can find event that has _id = 123 appearing in myindex-2015.06.21 and myindex-2015.06.22

is this sort of duplicate out of box? what should I do to avoid it? any suggestion or readings will be appreciated, thanks!

Upvotes: 0

Views: 1357

Answers (2)

kevh
kevh

Reputation: 323

I had the exact same issue: several duplicated documents with the same id but in different indexes (I have 1 index / date).

What worked for me was to generate a field with the index name and reuse it in the output part of the logstash configuration.

index => "%{index_name}"
document_id => "%{clickID}"

Upvotes: 0

Jettro Coenradie
Jettro Coenradie

Reputation: 4733

Since you are using multiple indexes, one for every day, you can get the same _id. What makes a document unique is the uid, which is a combination of index,type and id. There is no way in elastic to change this to my knowledge.

Upvotes: 1

Related Questions