bigfanofcpp
bigfanofcpp

Reputation: 33

Two elasticsearch jdbc river, index data count not match database data count

The table agent_task_base has 12000000 rows

curl -XPUT 'localhost:9200/river/myjdbc_river1/meta' -d '{
    "type" : "jdbc",
    "jdbc" : {
        "url" : "...",
        "user" : "...",
        "password" : "...",
        "sql" : "select * from agenttask_base where status=1",
        "index" : "my_jdbc_index1",
        "type" : "my_jdbc_type1"
    }
}'

curl -XPUT 'localhost:9200/river/myjdbc_river2/meta' -d '{
    "type" : "jdbc",
    "jdbc" : {
        "url" : "...",
        "user" : "...",
        "password" : "..",
        "sql" : "select * from agenttask_base where status=1",
        "index" : "my_jdbc_index2",
        "type" : "my_jdbc_type2"
    }
}'

two river execute together, but final result is

my_jdbc_index1 has 10000000+ rows

my_jdbc_index2 has 11000000+ rows

Why????

Upvotes: 1

Views: 482

Answers (2)

Jay Rizzi
Jay Rizzi

Reputation: 4304

I just figured this out after much trial and error, as i was experiencing the same issue

what worked for me was defining the jdbc river parameters bulk_size and max_bulk_requests

curl -XPUT 'localhost:9200/river/myjdbc_river1/meta' -d '{
    "type" : "jdbc",
    "jdbc" : {
        "url" : "...",
        "user" : "...",
        "password" : "...",
        "sql" : "select * from agenttask_base where status=1",
        "index" : "my_jdbc_index1",
        "type" : "my_jdbc_type1",
        "bulk_size" : 160,
        "max_bulk_requests" : 5  
    }
}'

bulk size of 160 seemed to be my magic number, bulk size of 500 was too high for my local install, and would return a java.sql exception closing the database connection, but was ok for my web server environment

bottom line is you can tinker with these numbers to tune performance, but by setting them you should see your index doc count match your sql result count

Upvotes: 0

rufer7
rufer7

Reputation: 4119

There is an issue on github of elasticsearch-jdbc-river (#143) which describes the sam problem as you described above. Try to reduce the max bulk requests and let elasticsearch indexing again.

For more details see: https://github.com/jprante/elasticsearch-river-jdbc/issues/143#issuecomment-29550301

I hope this will help

Upvotes: 1

Related Questions