Reputation: 4509
I have an application to track page visits for a website. Here's my model:
public class VisitSession {
public string SessionId { get; set; }
public DateTime StartTime { get; set; }
public string UniqueVisitorId { get; set; }
public IList<PageVisit> PageVisits { get; set; }
}
When a visitor go to the website, a visit session starts. One visit session has many page visits. The tracker will write a UniqueVisitorId (GUID) cookie when the first time a visitor go to the website. So we are able to know if a visitor is returning visitor.
Now I want to write a view displaying TotalVisitSessions, TotalPageVisits, TotalUniqueVisitors for each day. So I write this multi-map reduce:
public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate>
{
public VisitSummaryByDateIndex()
{
AddMap<VisitSession>(sessions => from s in sessions
select new VisitSummaryByDate
{
Date = s.StartTime.Date,
TotalVisitSessions = 1,
TotalPageVisits = 0,
TotalNewVisitors = s.IsNewVisit ? 1 : 0,
TotalUniqueVisitors = 0,
UniqueVisitorId = s.UniqueVisitorId
});
AddMap<PageVisit>(visits => from v in visits
select new VisitSummaryByDate
{
Date = v.VisitTime.Date,
TotalVisitSessions = 0,
TotalPageVisits = 1,
TotalNewVisitors = 0,
TotalUniqueVisitors = 0,
UniqueVisitorId = String.Empty
});
Reduce = results => from result in results
group result by result.Date into g
select new VisitSummaryByDate
{
Date = g.Key,
TotalVisitSessions = g.Sum(it => it.TotalVisitSessions),
TotalPageVisits = g.Sum(it => it.TotalPageVisits),
TotalNewVisitors = g.Sum(it => it.TotalNewVisitors),
TotalUniqueVisitors = g.Select(it => it.UniqueVisitorId).Where(it => it.Length > 0).Distinct().Count(),
UniqueVisitorId = String.Empty
};
}
}
The problem is in the "TotalUniqueVisitors" calculation, sometimes the TotalUniqueVisitors of the index result is 1, sometimes is 2. But I checked the data, it will never be like so less. Is there something wrong with my Map/Reduce syntax?
Related post: Raven DB: How to create "UniqueVisitorCount by date" index
Code with sample data can be found here: https://gist.github.com/2702071
Upvotes: 4
Views: 1143
Reputation: 22956
Reduce is actually processed multiple time over the result. Your index assume that this happens only once, and have access to the entire result set.
Your index need to looks like this:
public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate>
{
public VisitSummaryByDateIndex()
{
AddMap<VisitSession>(sessions => from s in sessions
select new VisitSummaryByDate
{
Date = s.StartTime.Date,
TotalVisitSessions = 1,
TotalPageVisits = 0,
TotalNewVisitors = s.IsNewVisit ? 1 : 0,
TotalUniqueVisitors = 1,
UniqueVisitorId = new[]{s.UniqueVisitorId}
});
AddMap<PageVisit>(visits => from v in visits
select new VisitSummaryByDate
{
Date = v.VisitTime.Date,
TotalVisitSessions = 0,
TotalPageVisits = 1,
TotalNewVisitors = 0,
TotalUniqueVisitors = 0,
UniqueVisitorId = new string[0]
});
Reduce = results => from result in results
group result by result.Date into g
select new VisitSummaryByDate
{
Date = g.Key,
TotalVisitSessions = g.Sum(it => it.TotalVisitSessions),
TotalPageVisits = g.Sum(it => it.TotalPageVisits),
TotalNewVisitors = g.Sum(it => it.TotalNewVisitors),
TotalUniqueVisitors = g.Sum(it => it.TotalUniqueVisitors),,
UniqueVisitorId = g.Select(x=>x.UniqueVisitorId).Distinct()
};
}
}
Upvotes: 2
Reputation: 5503
The correct index is:
public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate>
{
public VisitSummaryByDateIndex()
{
AddMap<VisitSession>(sessions => from s in sessions
select new VisitSummaryByDate
{
Date = s.StartTime.Date,
TotalVisitSessions = 1,
TotalPageVisits = 0,
TotalNewVisitors = s.IsNewVisit ? 1 : 0,
TotalUniqueVisitors = 0,
UniqueVisitorId = s.UniqueVisitorId
});
AddMap<PageVisit>(visits => from v in visits
select new VisitSummaryByDate
{
Date = v.VisitTime.Date,
TotalVisitSessions = 0,
TotalPageVisits = 1,
TotalNewVisitors = 0,
TotalUniqueVisitors = 0,
UniqueVisitorId = string.Empty,
});
Reduce = results => from result in results
group result by result.Date into g
select new VisitSummaryByDate
{
Date = g.Key,
TotalVisitSessions = g.Sum(it => it.TotalVisitSessions),
TotalPageVisits = g.Sum(it => it.TotalPageVisits),
TotalNewVisitors = g.Sum(it => it.TotalNewVisitors),
TotalUniqueVisitors = g.Select(it => it.UniqueVisitorId).Where(x => x.Length > 0).Distinct().Count(),
UniqueVisitorId = g.FirstOrDefault().UniqueVisitorId,
};
}
}
The difference is that UniqueVisitorId is set in the reduce. I'm not 100% certain why this is needed yet, I must admit.
Upvotes: 2