Reputation: 2011
I created logstash config file with JDBC input plugin to bring Oracle database tables into elasticsearch and i made it as schedule for every five minutes.
Its working as per expected but the problem is, its inserting duplicate records when it is running for 2nd, 3rd time. how can we avoid inserting duplication records into elasticsearch.?
Please find my logstash config file with JDBC input plugin
input {
jdbc {
jdbc_driver_library => "D:\1SearchEngine\data\ojdbc8.jar"
jdbc_driver_class => "Java::oracle.jdbc.OracleDriver"
jdbc_connection_string => "jdbc:oracle:thin:@localhost:1521:XE"
jdbc_user => "demo"
jdbc_password => "1234567"
schedule => "*/5 * * * *"
statement => "select * from documents"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "schedule1_documents"
}
}
please find my document table schema
id ---> Not Null number
FileName ---> varchar2
Path ----> varchar2
File_size ---> varchar2
Upvotes: 2
Views: 995
Reputation: 217564
You need to use the id field from your documents
table. Otherwise, ES will create an id itself.
So your output should look like this instead:
elasticsearch {
hosts => ["localhost:9200"]
index => "schedule1_documents"
document_id => "%{id}" <-- add this line with the proper ID field
}
Upvotes: 3