Reputation: 7417
I'm thinking about using Apache Solr. In my db I will have around 10.000.000 records.The worst case where I will use it has around 20 searchable/sortable fields. My problem is that these fields may change values frequently during the day. For example in my db I might change some fields at the same time of 10000 records and this may happen 0, 1 or 1000 times a day etc. The point is that each time I update a value in the db I want it to be updated in solr too so I can search with the updated data each time.
For those of you that have used solr, how fast can re indexing in such volumes be? Will this update (delete and readd a record from what i read) and it's indexing for example cost 5seconds, 5 minutes , one hour , what? Consider it will be running on a good server.
Upvotes: 2
Views: 1199
Reputation: 9964
It's very hard to tell without actually trying. However you need to know that Lucene and Solr currently don't support individual document updates (although there is some work in progress https://issues.apache.org/jira/browse/LUCENE-3837), meaning that you need to re-index the whole record even if you only updated a single field.
Moreover Lucene and Solr are much better at performing batch updates than single-document updates. To work around this, Solr has a nice commitWithin parameter that lets Solr group individual updates together to improve throughput.
You should take this number precautiously, but I often create indexes of millions of documents (~30 small fields) with a throughput of ~5000 docs/s on very conventional hardware.
Upvotes: 3