JanivZ
JanivZ

Reputation: 2325

Raven DB DocumentStore - throws out of memory exception

I have code like this:

public bool Set(IEnumerable<WhiteForest.Common.Entities.Projections.RequestProjection> requests)
    {
        var documentSession = _documentStore.OpenSession();
        //{
        try
        {
            foreach (var request in requests)
            {
                documentSession.Store(request);
            }
            //requests.AsParallel().ForAll(x => documentSession.Store(x));
            documentSession.SaveChanges();
            documentSession.Dispose();
            return true;
        }
        catch (Exception e)
        {
            _log.LogDebug("Exception in RavenRequstRepository - Set. Exception is [{0}]", e.ToString());
            return false;
        }
        //}
    }

This code gets called many times. After i get to around 50,000 documents that have passed through it i get an OutOfMemoryException. Any idea why ? perhaps after a while i need to declare a new DocumentStore ?

thank you

**

**

I ended up using the Batch/Patch API to perform the update I needed. You can see the discussion here: https://groups.google.com/d/topic/ravendb/3wRT9c8Y-YE/discussion

Basically since i only needed to update 1 property on my objects, and after considering ayendes comments about re-serializing all the objects back to JSON, i did something like this:

internal void Patch()
    {
        List<string> docIds = new List<string>() { "596548a7-61ef-4465-95bc-b651079f4888", "cbbca8d5-be45-4e0d-91cf-f4129e13e65e" };
        using (var session = _documentStore.OpenSession())
        {
            session.Advanced.DatabaseCommands.Batch(GenerateCommands(docIds));
        }
    }

    private List<ICommandData> GenerateCommands(List<string> docIds )
    {
        List<ICommandData> retList = new List<ICommandData>();

        foreach (var item in docIds)
        {
            retList.Add(new PatchCommandData()
            {
                Key = item,
                Patches = new[] { new  Raven.Abstractions.Data.PatchRequest () {
                Name = "Processed",
                Type = Raven.Abstractions.Data.PatchCommandType.Set,
                Value = new RavenJValue(true)
            }}});
        }

        return retList;
    }

Hope this helps ...

Thanks alot.

Upvotes: 4

Views: 1591

Answers (3)

jocull
jocull

Reputation: 21095

DocumentStore is a disposable class, so I worked around this problem by disposing the instance after each chunk. I highly doubt this is the most efficient way to run operations, but it will prevent significant memory overhead from happening.

I was running a sort of "delete all" operation like so. You can see the using blocks disposing both the DocumentStore and the IDocumentSession objects after each chunk.

static DocumentStore GetDataStore()
{
    DocumentStore ds = new DocumentStore
    {
        DefaultDatabase = "test",
        Url = "http://localhost:8080"
    };

    ds.Initialize();
    return ds;
}

static IDocumentSession GetDbInstance(DocumentStore ds)
{
    return ds.OpenSession();
}

static void Main(string[] args)
{
    do
    {
        using (var ds = GetDataStore())
        using (var db = GetDbInstance(ds))
        {
            //The `Take` operation will cap out at 1,024 by default, per Raven documentation
            var list = db.Query<MyClass>().Skip(deleteSum).Take(5000).ToList(); 
            deleteCount = list.Count;
            deleteSum += deleteCount;

            foreach (var item in list)
            {
                db.Delete(item);
            }
            db.SaveChanges();
            list.Clear();
        }
    } while (deleteCount > 0);
}

Upvotes: 0

Bob Horn
Bob Horn

Reputation: 34297

I just did this for my current project. I chunked the data into pieces and saved each chunk in a new session. This may work for you, too.

Note, this example shows chunking by 1024 documents at a time, but needing at least 2000 before we decide it's worth chunking. So far, my inserts got the best performance with a chunk size of 4096. I think that's because my documents are relatively small.

internal static void WriteObjectList<T>(List<T> objectList)
{
    int numberOfObjectsThatWarrantChunking = 2000;  // Don't bother chunking unless we have at least this many objects.

    if (objectList.Count < numberOfObjectsThatWarrantChunking)
    {
        // Just write them all at once.
        using (IDocumentSession ravenSession = GetRavenSession())
        {
            objectList.ForEach(x => ravenSession.Store(x));
            ravenSession.SaveChanges();
        }

        return;
    }

    int numberOfDocumentsPerSession = 1024;  // Chunk size

    List<List<T>> objectListInChunks = new List<List<T>>();

    for (int i = 0; i < objectList.Count; i += numberOfDocumentsPerSession)
    {
        objectListInChunks.Add(objectList.Skip(i).Take(numberOfDocumentsPerSession).ToList());
    }

    Parallel.ForEach(objectListInChunks, listOfObjects =>
    {
        using (IDocumentSession ravenSession = GetRavenSession())
        {
            listOfObjects.ForEach(x => ravenSession.Store(x));
            ravenSession.SaveChanges();
        }
    });
}

private static IDocumentSession GetRavenSession()
{
    return _ravenDatabase.OpenSession();
}

Upvotes: 4

Ayende Rahien
Ayende Rahien

Reputation: 22956

Are you trying to save it all in one call? The DocumentSession need to turn all of the objects that you pass it into a single request to the server. That means that it may allocate a lot of memory for the write to the server. Usually we recommend on batches of about 1,024 items in you are doing bulks saves.

Upvotes: 2

Related Questions