Eugene Emelin
Eugene Emelin

Reputation: 1

RavenDB processing all documents of a certain type

I have some problem with updating all documents in a collection. What I need to do: I need to iterate through ~2 million docs load each doc into memory, parse HTML from one of fields of a doc and save the doc back to DB.

I tried take/skip logic with/without indexes but Id etc. but some records still remain unchanged (even tested for 1000 records with 128 records in a page). In the process of updating documents no more records are inserted. Simple patching (patching API) does not work for this as the update I need to perform is quite complex

Please help with this. Thanks

Code:

public static int UpdateAll<T>(DocumentStore docDB, Action<T> updateAction)
{
    return UpdateAll(0, docDB, updateAction);
}

public static int UpdateAll<T>(int startFrom, DocumentStore docDB, Action<T> updateAction)
{
    using (var session = docDB.OpenSession())
    {
        int queryCount = 0;
        int start = startFrom;
        while (true)
        {
            var current = session.Query<T>().Take(128).Skip(start).ToList();
            if (current.Count == 0)
                break;

            start += current.Count;

            foreach (var doc in current)
            {
                updateAction(doc);
            }

            session.SaveChanges();
            queryCount += 2;

            if (queryCount >= 30)
            {
                return UpdateAll(start, docDB, updateAction);
            }
        }
    }

    return 1;
}

Upvotes: 0

Views: 106

Answers (1)

Dustin Hartrick
Dustin Hartrick

Reputation: 99

Move your session.SaveChanges(); to outside the while loop.

As per Raven's session design, you can only do 30 interactions with the database during any given instance of a session.

If you refactor your code to only SaveChanges() once (or very few times) per using block, it should work. For more information, check out the Raven docs : Understanding The Session Object - RavenDB

Upvotes: 0

Related Questions