Reputation: 33
The table agent_task_base has 12000000 rows
curl -XPUT 'localhost:9200/river/myjdbc_river1/meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "...",
"user" : "...",
"password" : "...",
"sql" : "select * from agenttask_base where status=1",
"index" : "my_jdbc_index1",
"type" : "my_jdbc_type1"
}
}'
curl -XPUT 'localhost:9200/river/myjdbc_river2/meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "...",
"user" : "...",
"password" : "..",
"sql" : "select * from agenttask_base where status=1",
"index" : "my_jdbc_index2",
"type" : "my_jdbc_type2"
}
}'
two river execute together, but final result is
my_jdbc_index1 has 10000000+ rows
my_jdbc_index2 has 11000000+ rows
Why????
Upvotes: 1
Views: 482
Reputation: 4304
I just figured this out after much trial and error, as i was experiencing the same issue
what worked for me was defining the jdbc river parameters bulk_size and max_bulk_requests
curl -XPUT 'localhost:9200/river/myjdbc_river1/meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "...",
"user" : "...",
"password" : "...",
"sql" : "select * from agenttask_base where status=1",
"index" : "my_jdbc_index1",
"type" : "my_jdbc_type1",
"bulk_size" : 160,
"max_bulk_requests" : 5
}
}'
bulk size of 160 seemed to be my magic number, bulk size of 500 was too high for my local install, and would return a java.sql exception closing the database connection, but was ok for my web server environment
bottom line is you can tinker with these numbers to tune performance, but by setting them you should see your index doc count match your sql result count
Upvotes: 0
Reputation: 4119
There is an issue on github of elasticsearch-jdbc-river (#143) which describes the sam problem as you described above. Try to reduce the max bulk requests and let elasticsearch indexing again.
For more details see: https://github.com/jprante/elasticsearch-river-jdbc/issues/143#issuecomment-29550301
I hope this will help
Upvotes: 1