Ben Watson
Ben Watson

Reputation: 5531

Guaranteeing a Solr Commit has occurred

I have a Java job that writes documents to Solr using SolrCloud. Input data is transformed into a map of different entities and each entity is then written to the Solr collection corresponding to its entity type.

My code looks like:

public void updateSolrDocumentsToCollection(String collectionName, Collection<SolrInputDocument> documents) {
    this.solrClient.setDefaultCollection(collectionName);
    UpdateRequest updateRequest = new UpdateRequest();
    updateRequest.add(documents);
    updateRequest.setCommitWithin(100); //100ms
    updateRequest.process(this.solrClient);
}

This method is called once for each collection to which I'm writing, and then a final call is made to write one last document to an audit collection.

In integration tests, I wait until I can retrieve the document from the audit collection, and then retrieve the documents from the entity collections.

The problem

I make the assumption that because audit is written to last, once I can retrieve from audit then I can retrieve from any other collection I've previously written to. However this does not appear to be true. About 1% of the time, an audit document is retrieved, but the tests fail because the other collections do not yet contain their documents.

Even adding a Thread.sleep(1000) before retrieving documents doesn't help. That's ten times the commit window, so surely I should be guaranteed to see documents?

How can I guarantee that all documents are searchable?

Upvotes: 2

Views: 929

Answers (2)

jay
jay

Reputation: 2077

Are you using SolrCloud or a Master/Slave configuration? If you have master slave then commitWithin might not work. See here.

The commitWithin settings allow forcing document commits to happen in a defined time period. This is used most frequently with Near Real Time Searching, and for that reason the default is to perform a soft commit. This does not, however, replicate new documents to slave servers in a master/slave environment. If that's a requirement for your implementation, you can force a hard commit by adding a parameter, as in this example:

If not, can you try direct commit() from your code and see if that works?

You can also check solr logs to see how often your commits are taking place. And if those commits have openSearcher=true it means with every commit a new searcher is opened. If you are indexing in bulk, you might benefit keeping this as false.

CommitWithin issues a softcommit which opens a new Searcher. It might be possible that you are issuing commits every 100ms but openning a new searcher is taking longer than that.

Try increasing your commitWithin to say 500ms or 1000ms and see if that works.

Upvotes: 1

EricLavault
EricLavault

Reputation: 16035

You can check the number of uncommitted documents using the MBean Request Handler (/admin/mbeans). This handler offers programmatic access to the information provided on the Plugin/Stats page of the Admin UI.

Use the param stats=true to get stats properties and check for docsPending (the number of documents pending for commit) in the "UPDATEHANDLER" category. You can restrict results by category name using param cat=UPDATEHANDLER, however it is not possible to access docsPending directly by query (afaik).

Example query :

https://host.example.com/solr/collectionName/admin/mbeans?wt=json&indent=true&stats=true

"/admin/ " handlers are registered implicitly as of Solr 5.0.0, prior versions needs an explicit registration in solrconfig.xml.

Upvotes: 0

Related Questions