Reputation: 1141
I'm pretty new to RavenDB and am struggling to find a solution to the following:
I have a collection called ServiceCalls that look like this:
public class ServiceCall
{
public int ID { get; set; }
public string IncidentNumber { get; set; }
public string Category { get; set; }
public string SubCategory { get; set; }
public DateTime ReportedDateTime { get; set; }
public string Block { get; set; }
public decimal Latitude { get; set; }
public decimal Longitude { get; set; }
}
I have an index named ServiceCalls/CallsByCategory that looks like this:
Map = docs => from doc in docs
select new
{
Category = doc.Category,
CategoryCount = 1,
ServiceCalls = doc,
};
Reduce = results => from result in results
group result by result.Category into g
select new
{
Category = g.Key,
CategoryCount = g.Count(),
ServiceCalls = g.Select(i => i.ServiceCalls)
};
So the output is:
public class ServiceCallsByCategory
{
public string Category { get; set; }
public int CategoryCount { get; set; }
public IEnumerable<ServiceCall> ServiceCalls { get; set; }
}
using this query everything works as it should
var q = from i in session.Query<ServiceCallsByCategory>("ServiceCalls/CallsByCategory") select i
Where I am absolutely lost is writing an index that would allow me to query by ReportedDateTime. Something that would allow me to do this:
var q = from i in session.Query<ServiceCallsByCategory>("ServiceCalls/CallsByCategory")
where i.ServiceCalls.Any(x=>x.ReportedDateTime >= new DateTime(2012,10,1))
select i
Any guidance would be MUCH appreciated.
Upvotes: 3
Views: 1289
Reputation: 2226
Could you add ReportedDateTime to the Map and aggregate it in the Reduce? If you only care about the max per category, something like this should be sufficient.
Map = docs => from doc in docs
select new
{
Category = doc.Category,
CategoryCount = 1,
ServiceCalls = doc,
ReportedDateTime
};
Reduce = results => from result in results
group result by result.Category into g
select new
{
Category = g.Key,
CategoryCount = g.Sum(x => x.CategoryCount),
ServiceCalls = g.Select(i => i.ServiceCalls)
ReportedDateTime = g.Max(rdt => rdt.ReportedDateTime)
};
You could then query it just based on the aggregated ReportedDateTime:
var q = from i in session.Query<ServiceCallsByCategory>("ServiceCalls/CallsByCategory")
where i.ReportedDateTime >= new DateTime(2012,10,1)
select i
Upvotes: 0
Reputation: 241959
A few things,
You can't have a .Count()
method in your reduce clause. If you look closely, you will find your counts are wrong. As of build 2151, this will actually throw an exception. Instead, you want CategoryCount = g.Sum(x => x.CategoryCount)
You always want the structure of the map to match the structure of the reduce. If you're going to build a list of things, then you should map a single element array of each thing, and use .SelectMany()
in the reduce step. The way you have it now only works due to a quirk that will probably be fixed at some point.
By building the result as a list of ServiceCalls, you are copying the entire document into the index storage. Not only is that inefficient, but it's unnecessary. You would do better keeping a list of just the ids. Raven has an .Include()
method that you can use if you need to retrieve the full document. The main advantage here is that you are guaranteed to have the most current data for each item you get back, even if your index results are still stale.
Putting all three together, the correct index would be:
public class ServiceCallsByCategory
{
public string Category { get; set; }
public int CategoryCount { get; set; }
public int[] ServiceCallIds { get; set; }
}
public class ServiceCalls_CallsByCategory : AbstractIndexCreationTask<ServiceCall, ServiceCallsByCategory>
{
public ServiceCalls_CallsByCategory()
{
Map = docs => from doc in docs
select new {
Category = doc.Category,
CategoryCount = 1,
ServiceCallIds = new[] { doc.ID },
};
Reduce = results => from result in results
group result by result.Category
into g
select new {
Category = g.Key,
CategoryCount = g.Sum(x => x.CategoryCount),
ServiceCallIds = g.SelectMany(i => i.ServiceCallIds)
};
}
}
Querying it with includes, would look like this:
var q = session.Query<ServiceCallsByCategory, ServiceCalls_CallsByCategory>()
.Include<ServiceCallsByCategory, ServiceCall>(x => x.ServiceCallIds);
When you need a document, you still load it with session.Load<ServiceCall>(id)
but Raven will not have to make a round trip back to the server to get it.
NOW - that doesn't address your question about how to filter the results by date. For that, you really need to think about what you are trying to accomplish. All of the above would assume that you really want every service call shown for each category at once. Most of the time, that's not going to be practical because you want to paginate results. You probably DON'T want to even use what I've described above. I am making some grand assumptions here, but most of the time one would filter by category, not group by it.
Let's say you had an index that just counts the categories (the above index without the list of service calls). You might use that to display an overview screen. But you wouldn't be interested in the documents that were in each category until you clicked one and drilled into a details screen. At that point, you know which category you're in, and you can filter by it and reduce to a date range without a static index:
var q = session.Query<ServiceCall>().Where(x=> x.Category == category && x.ReportedDateTime >= datetime)
If I am wrong and you really DO need to show all documents from all categories, grouped by category, and filtered by date, then you are going to have to adopt an advanced technique like the one I described in this other StackOverflow answer. If this is really what you need, let me know in comments and I'll see if i can write it for you. You will need Raven 2.0 to make it work.
Also - be very careful about what you are storing for ReportedDateTime. If you are going to be doing any comparisons at all, you need to understand the difference between calendar time and instantaneous time. Calendar time has quirks like daylight savings transitions, time zone differences, and more. Instantaneous time tracks the moment something happened, regardless of who's asking. You probably want instantaneous time for your usage, which means either using a UTC DateTime
, or switching to DateTimeOffset
which will let you represent instantaneous time without losing the local contextual value.
Update
I experimented with trying to build an index that would use that technique I described to let you have all results in your category groups but still filter by date. Unfortunately, it's just not possible. You would have to have all ServiceCalls grouped together in the original document and express it in the Map. It doesn't work the same way at all if you have to Reduce first. So you really should just consider simple query for ServiceCalls once you are in a specific Category.
Upvotes: 4