Reputation: 5824
I have four nodes solrcloud setup version 4.10 and my collection has 4 shards, 2 replicas. My application provide the search ability with realtime data ingestion, both data ingestion and search processes runs in parallel.
Every day the data load is around 2~3MM records(insert/update operations) and total documents count is 80MM+.
The problem we are facing is that the solr returns very inconsistent records count during peak time of data ingestion.
Sample query:
for i in `seq 1 50`;
do
curl 'http://localhost:8888/solr/OPTUM/select?q=*:*&wt=json&indent=true'|grep numFound|rev|cut -d'{' -f1 |rev
done
The response numfound
variable shows sometime very less documents count then actually present in solr.
Please suggest if I need to make any configuration change to get consistent count.
Upvotes: 11
Views: 1901
Reputation: 5824
I have not yet found root cause of this problem but temporarily I did a work around to resolve this error.
I had been using solrj4.x softcommit method(UpdateRequest.setCommitWithin( commitWithinMs )
) which I commented and used all commit strategy at solr end.
<autoCommit>
<maxTime>15000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>2000</maxTime>
</autoSoftCommit>
I am getting consistent result from solr but still I am not sure why solrj client side commit isn't working.
Upvotes: 0
Reputation: 21
Seems the problem is related how you query your distributed setup -- you said "my collection has 4 shards, 2 replicas" across 4 nodes ... your inconsistent results may be due to that you are redirected to a shard based on a load-balance algorithm -- so a different shard is used every time AND returns to you a different (subset) result set.
Read Distributed Requests documentation here.
Try adding something like:
http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=nodehost1:7574/solr,nodehost2:8983/solr,nodehost3:8983/solr,nodehost4:8983/solr
Upvotes: 1