Stephen Patten
Stephen Patten

Reputation: 6363

RavenDb expectation of performance to query documents that number in the millions

I was able to load a couple of million documents with the embedded version of RavenDb, pretty slick!.

Now I'm trying to query those items and am finding that the performance is not what I had expected, near instantaneous if possible, but instead upwards of 18 seconds on a fairly beefy machine.

Below, you'll find my naive code.

Note: I have now resolved this, and the final code is at the bottom of the post. The take away is that you need indexes, they have to be of the right type, and RavenDB needs to be made aware of them. VERY pleased with the perf and quality of the returned records via the query engine.

Thank you, Stephen

using (var store = new EmbeddableDocumentStore { DataDirectory = @"C:\temp\ravendata" }.Initialize())
{
    using (IDocumentSession session = store.OpenSession())
    {
        var q = session.Query<Product>().Where(x => x.INFO2.StartsWith("SYS")).ToList();
    }
}


[Serializable]
public class Product
{
    public decimal ProductId { get; set; }
    ....
    public string INFO2 { get; set; }
}

EDIT

I added this class

public class InfoIndex_Search : AbstractIndexCreationTask<Product>
{
    public InfoIndex_Search()
    {
        Map = products => 
            from p in products
                          select new { Info2Index = p.INFO2 };

        Index(x => x.INFO2, FieldIndexing.Analyzed);
    }
}

and changed the calling method to look like this.

        using (var store = new EmbeddableDocumentStore { DataDirectory = @"C:\temp\ravendata" }.Initialize())
        {
            // Tell Raven to create our indexes.
            IndexCreation.CreateIndexes(Assembly.GetExecutingAssembly(), store);

            List<Product> q = null;
            using (IDocumentSession session = store.OpenSession())
            {
                q = session.Query<Product>().Where(x => x.INFO2.StartsWith("SYS")).ToList();
                watch.Stop();
            }
        }

But I'm still reporting 18 Seconds to do the search. What am I missing? On another note, there are quite a few new files in the C:\temp\ravendata\Indexes\InfoIndex%2fSearch folder, although not near as many as when I inserted the data, they seems to have all but disappeared after running this code a few times trying to query. Should the IndexCreation.CreateIndexes(Assembly.GetExecutingAssembly(), store); be called prior to insert, and only then?

EDIT1

Using this code I was able to get the query to happen almost in an instance, but it seems you can only run this once, so that begs the question. Where does this get run and what are the proper initialization procedures?

store.DatabaseCommands.PutIndex("ProdcustByInfo2", new IndexDefinitionBuilder<Product>
{
    Map = products => from product in products
                      select new { product.INFO2 },
    Indexes =
            {
                { x => x.INFO2, FieldIndexing.Analyzed}
            }
});

EDIT2: working example

static void Main()
{
    Stopwatch watch = Stopwatch.StartNew();

    int q = 0;
    using (var store = new EmbeddableDocumentStore { DataDirectory = @"C:\temp\ravendata" }.Initialize())
    {
        if (store.DatabaseCommands.GetIndex("ProdcustByInfo2") == null)
        {
            store.DatabaseCommands.PutIndex("ProdcustByInfo2", new IndexDefinitionBuilder<Product>
            {
                Map = products => from product in products
                                  select new { product.INFO2 },
                Indexes = { { x => x.INFO2, FieldIndexing.Analyzed } }
            });
        }
        watch.Stop();
        Console.WriteLine("Time elapsed to create index {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);

        watch = Stopwatch.StartNew();               
        using (IDocumentSession session = store.OpenSession())
        {
            q = session.Query<Product>().Count();
        }
        watch.Stop();
        Console.WriteLine("Time elapsed to query for products values {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
        Console.WriteLine("Total number of products loaded: {0}{1}", q, System.Environment.NewLine);

        if (q == 0)
        {
            watch = Stopwatch.StartNew();
            var productsList = Parsers.GetProducts().ToList();
            watch.Stop();
            Console.WriteLine("Time elapsed to parse: {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
            Console.WriteLine("Total number of items parsed: {0}{1}", productsList.Count, System.Environment.NewLine);

            watch = Stopwatch.StartNew();
            productsList.RemoveAll(_ => _ == null);
            watch.Stop();
            Console.WriteLine("Time elapsed to remove null values {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
            Console.WriteLine("Total number of items loaded: {0}{1}", productsList.Count, System.Environment.NewLine);

            watch = Stopwatch.StartNew();
            int batch = 0;
            var session = store.OpenSession();
            foreach (var product in productsList)
            {
                batch++;
                session.Store(product);
                if (batch % 128 == 0)
                {
                    session.SaveChanges();
                    session.Dispose();
                    session = store.OpenSession();
                }
            }
            session.SaveChanges();
            session.Dispose();
            watch.Stop();
            Console.WriteLine("Time elapsed to populate db from collection {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
        }

        watch = Stopwatch.StartNew();
        using (IDocumentSession session = store.OpenSession())
        {
            q = session.Query<Product>().Where(x => x.INFO2.StartsWith("SYS")).Count();
        }
        watch.Stop();
        Console.WriteLine("Time elapsed to query for term {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
        Console.WriteLine("Total number of items found: {0}{1}", q, System.Environment.NewLine);
    }
    Console.ReadLine();
}

Upvotes: 4

Views: 1547

Answers (2)

Bob Horn
Bob Horn

Reputation: 34297

First, do you have an index covering INFO2?

Second, see Daniel Lang's "Searching on string properties in RavenDB" blog post here:

http://daniellang.net/searching-on-string-properties-in-ravendb/

If it helps, here's how I created an index:

public class LogMessageCreatedTime : AbstractIndexCreationTask<LogMessage>
{
    public LogMessageCreatedTime()
    {
        Map = messages => from message in messages
                          select new { MessageCreatedTime = message.MessageCreatedTime };
    }
}

And how I added it at runtime:

private static DocumentStore GetDatabase()
{            
    DocumentStore documentStore = new DocumentStore();            

    try
    {
        documentStore.ConnectionStringName = "RavenDb";                
        documentStore.Initialize();

        // Tell Raven to create our indexes.
        IndexCreation.CreateIndexes(typeof(DataAccessFactory).Assembly, documentStore);
    }
    catch
    {
        documentStore.Dispose();
        throw;
    }

    return documentStore;
}

In my case, I didn't have to query the index explicitly; it was just used when I queried normally.

Upvotes: 6

user111013
user111013

Reputation:

As Bob hints at, you should ensure you create indexes in Raven that cover the fields you intend to query.

Raven is quite fast, and can let you go quite a way without needing to do much. However once you start getting into large-ish document numbers, or need something non-default, you will find that you need static indexes.

There are plenty of examples on setting up and using indexes in Raven.

Upvotes: 0

Related Questions