Reputation: 5531
I have a Java job that writes documents to Solr using SolrCloud. Input data is transformed into a map of different entities and each entity is then written to the Solr collection corresponding to its entity type.
My code looks like:
public void updateSolrDocumentsToCollection(String collectionName, Collection<SolrInputDocument> documents) {
this.solrClient.setDefaultCollection(collectionName);
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.add(documents);
updateRequest.setCommitWithin(100); //100ms
updateRequest.process(this.solrClient);
}
This method is called once for each collection to which I'm writing, and then a final call is made to write one last document to an audit
collection.
In integration tests, I wait until I can retrieve the document from the audit
collection, and then retrieve the documents from the entity collections.
The problem
I make the assumption that because audit
is written to last, once I can retrieve from audit
then I can retrieve from any other collection I've previously written to. However this does not appear to be true. About 1% of the time, an audit document is retrieved, but the tests fail because the other collections do not yet contain their documents.
Even adding a Thread.sleep(1000)
before retrieving documents doesn't help. That's ten times the commit window, so surely I should be guaranteed to see documents?
How can I guarantee that all documents are searchable?
Upvotes: 2
Views: 929
Reputation: 2077
Are you using SolrCloud or a Master/Slave configuration? If you have master slave then commitWithin might not work. See here.
The commitWithin settings allow forcing document commits to happen in a defined time period. This is used most frequently with Near Real Time Searching, and for that reason the default is to perform a soft commit. This does not, however, replicate new documents to slave servers in a master/slave environment. If that's a requirement for your implementation, you can force a hard commit by adding a parameter, as in this example:
If not, can you try direct commit()
from your code and see if that works?
You can also check solr logs to see how often your commits are taking place. And if those commits have openSearcher=true
it means with every commit a new searcher is opened. If you are indexing in bulk, you might benefit keeping this as false
.
CommitWithin
issues a softcommit which opens a new Searcher. It might be possible that you are issuing commits every 100ms but openning a new searcher is taking longer than that.
Try increasing your commitWithin
to say 500ms
or 1000ms
and see if that works.
Upvotes: 1
Reputation: 16035
You can check the number of uncommitted documents using the MBean Request Handler (/admin/mbeans). This handler offers programmatic access to the information provided on the Plugin/Stats page of the Admin UI.
Use the param stats=true
to get stats properties and check for docsPending (the number of documents pending for commit) in the "UPDATEHANDLER" category. You can restrict results by category name using param cat=UPDATEHANDLER
, however it is not possible to access docsPending directly by query (afaik).
Example query :
https://host.example.com/solr/collectionName/admin/mbeans?wt=json&indent=true&stats=true
"/admin/ " handlers are registered implicitly as of Solr 5.0.0, prior versions needs an explicit registration in solrconfig.xml.
Upvotes: 0