Thomas James
Thomas James

Reputation: 511

RavenDB map/reduce index performance for bulk insert of 3 million records

I am currently using RavenDB in a proof-of-concept for a simple dashboard application that provides an aggregated view over incoming events into a system. Lets say for example the user can see a granularity of by hour (for a day), day, month or year.

I have 3 million existing events to import & index and I'm looking for the best / most performant way to go about doing this after a number of less than successful attempts.

Please note this question isn't about the performance of the application once the data and indexes have been generated, that part is very good.

So I have:

I can import the data without issue provided the indexes don't exist, however if the indexes exist i consistently get OutOfMemoryExceptions after about 45 minutes of index processing.

Can the indexing process be tweaked and what would be the suitable values?

Alternatively happy to have it suggested to approach the problem from a different way.

Upvotes: 2

Views: 310

Answers (1)

Thomas James
Thomas James

Reputation: 511

I found that separating the import process into batches (say all data for one month at a time), importing with the indexes existing in raven and then waiting until there were no longer any stale indexes produced the most stable results.

I used the GetStatistics().StaleIndexes combined with a Thread.Sleep to have the process wait between batches. I still had the session batch size left at 1024 documents per session.

Upvotes: 1

Related Questions