Reputation: 2367
Currently, Im using solr as a search server. My issue is that I do alot of real time indexing on the data set (although the document size is very small, only 100 chars). I was wondering how I could speed this up by disabling the need to commit, autocommit, etc. Just add it to the index, im not too worried about the dataset being too volatile. I use a node js library to index into solr. Here is a snippet:
var doc = {
id: id.id,
text_t: id.words
};
var callback = function(err, response) {
if (err) throw err;
solr.commit();
};
solr.add(doc, callback);
Removing, solr.commit()
does not index the doc (even though I thought commit() just persisted it to disk)
Upvotes: 3
Views: 2076
Reputation: 9964
The upcoming version of Solr will have a feature called soft commit which might interest you. A soft commit is similar to a commit but doesn't make a fsync to ensure that data has been written to disk. This means that you might lose data (in case of a failure of the power supply for example, but not if Solr crashes while the server keeps running) but a soft commit is likely to be much faster than a regular (hard) commit since the OS can leverage the buffer cache.
With a current version of Solr, a good trade-off would be to use the commitWithin
feature of Solr UpdateHandler. For example, by using 10000 as a value for the commitWithin parameter, you would ensure that any document is commited at most 10 seconds after it has been added to the index and would keep the commit rate under 1 commit every 10 seconds. Lower values of commitWithin will provide better freshness of the data while higher values would stress the disks less.
Upvotes: 6
Reputation: 9015
Similar to a database transaction, a document will not be added to Solr until the commit. The problem is, Solr commits are very expensive, as you have noticed. Sadly, there is no way right now to get around this, Solr does not work well for real time search. The way to improve performance for adding multiple documents is to add them as a batch and commit the whole set of documents once.
Ideally you could use Near Realtime Search, but that is still in development for Solr 4.0
Upvotes: 1