Chi Chan
Chi Chan

Reputation: 12350

What is the best way to query information out of a collection from within a document in RavenDB

For example say I have the following class

public class Parent
{
    public string Id { get; set; }
    public string Name { get; set; }

    public IList<Child> Children { get; set; }

    public class Child
    {
        public string Name { get; set; }
        public DateTime Date { get; set; }
    }
}

And the following document

var document = new Parent
{
    Id = "parent/1",
    Name = "1",
    Children = new List<Parent.Child>
    {
        new Parent.Child {Date = new DateTime(2013, 1, 2)},
        new Parent.Child {Date = new DateTime(2013, 1, 3)},
        new Parent.Child {Date = new DateTime(2013, 1, 4)},
    }
};

How do I:

  1. Ask RavenDB to give me the instance of Child with date 2013/1/4 without sending the whole document over?
  2. Profile my query to make sure I am doing the correct thing?

Upvotes: 1

Views: 183

Answers (1)

Matt Johnson-Pint
Matt Johnson-Pint

Reputation: 241485

The simple answer is that you don't want to do this. The whole point of a document database is that a document represents an entire aggregate root entity (in Domain Driven Design terminology). One would expect that in the document that you described that all "Children" would be relevant to the root "Parent".

For a real world example, consider a SalesOrder that has LineItems. Would it makes sense to retrieve just a single line from the order? Probably not. You would retrieve the entire SalesOrder - including all LineItems.

This is an important concept to understand. In an RDBMS (like SQL Server), you would need two separate tables to store an order and its line items. To retrieve an order, you'd have to read 1 record from the SalesOrders table, and N records from the LineItems table. This anti-pattern is commonly referred to as "Select N+1", and it causes major scalability problems when, for example, retrieving many orders in bulk.

So the best answer is, don't query at all. Do a simple session.Load<Parent>("parents/1") and you will have all the children data as well, with a single call to the database. Thus avoiding Select N+1 issues.

...

Now for the more complex answer - yes you can get just part of the document back. Typically you wouldn't do this for a single document though. You might want to do it for querying against multiple documents that have specific criteria.

For example, say you want to answer the query, "Give me all Children that have a specific Date". This would be a query with an index projection.

By default, Raven is geared towards returning entire documents, not partial documents. In other words, we would normally answer the related query "Give me all Parents that have any Children that have a specific date". That query looks like this:

var parents = session.Query<Parent>()
                     .Where(p=> p.Children.Any(c=> c.Date == theDate))
                     .ToList();

You could take the results of this query and filter out the children you wanted using linq-to-objects on the client side (after the above ToList call)

var children = parents.SelectMany(p=> p.Children.Where(c=> c.Date == theDate));

This would work, but it's not that efficient because you threw out a lot of the data you got back from the database.

The alternative would be to do this in one call using a static index with a projection. First, the index definition:

public class ChildrenIndex : AbstractIndexCreationTask<Parent>
{
    public ChildrenIndex()
    {
        Map = parents => from parent in parents
                         from child in parent.Children
                         select new
                         {
                             child.Date,
                             child.Name
                         };

        StoreAllFields(FieldStorage.Yes);
    }
}

Then the query:

session.Query<Parent.Child, ChildrenIndex>()
       .Where(x => x.Date == theDate)
       .AsProjection<Parent.Child>();

Note that the Parent.Child syntax is just because you have Child as a nested class of Parent.

In the index, we map the fields that we are going to want to project, and we store those fields so they can be retrieved. I used the StoreAllFields shorthand syntax, but you could also just mark each field separately with Store("FieldName", FieldStorage.Yes).

In the query, the first time I specify Parent.Child is so I can query by it in the Where predicate. Left alone here, the result would still be the containing Parent. So we use AsProjection to tell raven we really want to project the fields that we stored, rather than giving back the document that matched the index terms.

...

Now hopefully you can see that this is overkill for returning just one child out of one parent, and you should just load the whole parent for that.

If you find yourself loading specific children a lot, it could be that a child in your context really is its own aggregate and should be in its own document. You could still relate the parent and child, either by storing a ParentId on the child document, or a ChildrenIds list on the parent, or both.

Upvotes: 2

Related Questions