logstash schedule inserting duplicate records into elasticsearch

Question

I created logstash config file with JDBC input plugin to bring Oracle database tables into elasticsearch and i made it as schedule for every five minutes.

Its working as per expected but the problem is, its inserting duplicate records when it is running for 2nd, 3rd time. how can we avoid inserting duplication records into elasticsearch.?

Please find my logstash config file with JDBC input plugin

input {
      jdbc {
        jdbc_driver_library => "D:\1SearchEngine\data\ojdbc8.jar"
        jdbc_driver_class => "Java::oracle.jdbc.OracleDriver"
        jdbc_connection_string => "jdbc:oracle:thin:@localhost:1521:XE"
        jdbc_user => "demo"
        jdbc_password => "1234567"
        schedule => "*/5 * * * *"
        statement => "select * from documents"
      }
    }

    output {
      elasticsearch {
        hosts => ["localhost:9200"]
        index => "schedule1_documents"
      }
    }

please find my document table schema

id  ---> Not Null number
FileName ---> varchar2
Path     ----> varchar2
File_size ---> varchar2

Val · Accepted Answer

You need to use the id field from your documents table. Otherwise, ES will create an id itself.

So your output should look like this instead:

  elasticsearch {
    hosts => ["localhost:9200"]
    index => "schedule1_documents"
    document_id => "%{id}"              <-- add this line with the proper ID field
  }

logstash schedule inserting duplicate records into elasticsearch

Answers (1)

Related Questions