Mouhong Lin
Mouhong Lin

Reputation: 4509

Raven DB: What's wrong with this multi-map/reduce index?

I have an application to track page visits for a website. Here's my model:

public class VisitSession {
    public string SessionId { get; set; }
    public DateTime StartTime { get; set; }
    public string UniqueVisitorId { get; set; }
    public IList<PageVisit> PageVisits { get; set; }
}

When a visitor go to the website, a visit session starts. One visit session has many page visits. The tracker will write a UniqueVisitorId (GUID) cookie when the first time a visitor go to the website. So we are able to know if a visitor is returning visitor.

Now I want to write a view displaying TotalVisitSessions, TotalPageVisits, TotalUniqueVisitors for each day. So I write this multi-map reduce:

public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate>
{
    public VisitSummaryByDateIndex()
    {
        AddMap<VisitSession>(sessions => from s in sessions
                                            select new VisitSummaryByDate
                                            {
                                                Date = s.StartTime.Date,
                                                TotalVisitSessions = 1,
                                                TotalPageVisits = 0,
                                                TotalNewVisitors = s.IsNewVisit ? 1 : 0,
                                                TotalUniqueVisitors = 0,
                                                UniqueVisitorId = s.UniqueVisitorId
                                            });

        AddMap<PageVisit>(visits => from v in visits
                                    select new VisitSummaryByDate
                                    {
                                        Date = v.VisitTime.Date,
                                        TotalVisitSessions = 0,
                                        TotalPageVisits = 1,
                                        TotalNewVisitors = 0,
                                        TotalUniqueVisitors = 0,
                                        UniqueVisitorId = String.Empty
                                    });

        Reduce = results => from result in results
                            group result by result.Date into g
                            select new VisitSummaryByDate
                            {
                                Date = g.Key,
                                TotalVisitSessions = g.Sum(it => it.TotalVisitSessions),
                                TotalPageVisits = g.Sum(it => it.TotalPageVisits),
                                TotalNewVisitors = g.Sum(it => it.TotalNewVisitors),
                                TotalUniqueVisitors = g.Select(it => it.UniqueVisitorId).Where(it => it.Length > 0).Distinct().Count(),
                                UniqueVisitorId = String.Empty
                            };
    }
}

The problem is in the "TotalUniqueVisitors" calculation, sometimes the TotalUniqueVisitors of the index result is 1, sometimes is 2. But I checked the data, it will never be like so less. Is there something wrong with my Map/Reduce syntax?

Related post: Raven DB: How to create "UniqueVisitorCount by date" index

Code with sample data can be found here: https://gist.github.com/2702071

Upvotes: 4

Views: 1143

Answers (2)

Ayende Rahien
Ayende Rahien

Reputation: 22956

Reduce is actually processed multiple time over the result. Your index assume that this happens only once, and have access to the entire result set.

Your index need to looks like this:

public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate>
{
    public VisitSummaryByDateIndex()
    {
        AddMap<VisitSession>(sessions => from s in sessions
                                         select new VisitSummaryByDate
                                         {
                                             Date = s.StartTime.Date,
                                             TotalVisitSessions = 1,
                                             TotalPageVisits = 0,
                                             TotalNewVisitors = s.IsNewVisit ? 1 : 0,
                                             TotalUniqueVisitors = 1,
                                             UniqueVisitorId = new[]{s.UniqueVisitorId}
                                         });

        AddMap<PageVisit>(visits => from v in visits
                                    select new VisitSummaryByDate
                                    {
                                        Date = v.VisitTime.Date,
                                        TotalVisitSessions = 0,
                                        TotalPageVisits = 1,
                                        TotalNewVisitors = 0,
                                        TotalUniqueVisitors = 0,
                                        UniqueVisitorId = new string[0]
                                    });

        Reduce = results => from result in results
                            group result by result.Date into g
                            select new VisitSummaryByDate
                            {
                                Date = g.Key,
                                TotalVisitSessions = g.Sum(it => it.TotalVisitSessions),
                                TotalPageVisits = g.Sum(it => it.TotalPageVisits),
                                TotalNewVisitors = g.Sum(it => it.TotalNewVisitors),
                                TotalUniqueVisitors = g.Sum(it => it.TotalUniqueVisitors),,
                                UniqueVisitorId =  g.Select(x=>x.UniqueVisitorId).Distinct()
                             };
    }
}

Upvotes: 2

Simon
Simon

Reputation: 5503

The correct index is:

public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate>
{
    public VisitSummaryByDateIndex()
    {
        AddMap<VisitSession>(sessions => from s in sessions
                                         select new VisitSummaryByDate
                                         {
                                             Date = s.StartTime.Date,
                                             TotalVisitSessions = 1,
                                             TotalPageVisits = 0,
                                             TotalNewVisitors = s.IsNewVisit ? 1 : 0,
                                             TotalUniqueVisitors = 0,
                                             UniqueVisitorId = s.UniqueVisitorId
                                         });

        AddMap<PageVisit>(visits => from v in visits
                                    select new VisitSummaryByDate
                                    {
                                        Date = v.VisitTime.Date,
                                        TotalVisitSessions = 0,
                                        TotalPageVisits = 1,
                                        TotalNewVisitors = 0,
                                        TotalUniqueVisitors = 0,
                                        UniqueVisitorId = string.Empty,
                                    });

        Reduce = results => from result in results
                            group result by result.Date into g
                            select new VisitSummaryByDate
                            {
                                Date = g.Key,
                                TotalVisitSessions = g.Sum(it => it.TotalVisitSessions),
                                TotalPageVisits = g.Sum(it => it.TotalPageVisits),
                                TotalNewVisitors = g.Sum(it => it.TotalNewVisitors),
                                TotalUniqueVisitors = g.Select(it => it.UniqueVisitorId).Where(x => x.Length > 0).Distinct().Count(),
                                UniqueVisitorId = g.FirstOrDefault().UniqueVisitorId,
                            };
    }
}

The difference is that UniqueVisitorId is set in the reduce. I'm not 100% certain why this is needed yet, I must admit.

Upvotes: 2

Related Questions