Reputation: 51
We need to create our index in Solr and it is taking way too long. We have about 800k records and it seems like it is going to take 15 to 20 days at the rate at which it is indexing. We are looking for a one time index for now. Any suggestions?
Upvotes: 5
Views: 11781
Reputation: 5508
From my experience indexing big chunks of data might take a while. Index I'm working on have 2m items (size: 10G). Full index takes about 40 hours using DB.
There are some factors that might slowing you down:
Upvotes: 4
Reputation: 32392
I wrote a system to index about 300,000 records and after some performance tests, I configured SOLR to commit every 5 minutes. Look at the solrconfig.xml. There are several directives related to committing changes but you should not be committing after each record update. Either commit after every 100-200 records or commit every 5 minutes. This is especially important during a reindex of all data.
I chose 5 minutes because it is a reasonable setting for ongoing sync as well, since we poll our db for changes every minute. We tell users that it takes 5 minutes or so for changes to flow through to SOLR, and so far everyone is happy with that.
Upvotes: 4
Reputation: 52809
Any reason why the indexing takes so much time ? any preprocessing steps taking time ? cause this seem to be taking a usually high time.
Are these database records or rich documents ?
How are you indexing the data ? are you running frequent commits or optimization ?
Hows the system memory, cpu, space behaving ?
Might want to revisit some settings in solrconfig.xml
If all of the above seems fine, you can try an option -
Create seperate cores and run parallel jobs to index the data. After the index completes you can either merge the index or use distributed search.
Upvotes: 0