Gordon Copestake
Gordon Copestake

Reputation: 1637

What can I do to improve the speed of this query?

I have a linq query that returns the last page a user looked at based on a table of page hits. The fields are simply TimeStamp, UserID and URL which are logged from user activity. The query looks like this:

public static IQueryable GetUserStatus()
{
    var ctx = new AppEntities();
    var currentPageHits = ctx.Pagehits
        .GroupBy(x => x.UserID)
        .Select(x => x.Where(y => y.TimeStamp == x.Max(z => z.TimeStamp)))
        .SelectMany(x => x);

    return currentPageHits.OrderByDescending(o => o.TimeStamp);
}

The query works perfectly but runs slowly. Our DBA assures us that the table has indexes in all the right places and that the trouble must be with the query.

Is there anything inherently wrong or BAD with this, or is there a more efficient way of getting the same results?

Upvotes: 0

Views: 70

Answers (2)

Tim Schmelter
Tim Schmelter

Reputation: 460048

So you try to implement DENSE_RANK() OVER (PARTITION BY UserID ORDER BY TimeStamp DESC) with LINQ? So all latest records per user-group according to the Timestamp. You could try:

public static IQueryable GetUserStatus()
{
    var ctx = new AppEntities();
    var currentPageHits = ctx.Pagehits
        .GroupBy(x => x.UserID)
        .SelectMany(x => x.GroupBy(y => y.TimeStamp).OrderByDescending(g=> g.Key).FirstOrDefault())
        .OrderByDescending(x => x.TimeStamp);

    return currentPageHits;
}

So it's grouping the user-group by TimeStamp, then it takes the latest group(one or more records in case of ties). The SelectMany flattens the goups to records. I think this is more efficient than your query.

Upvotes: 2

xanatos
xanatos

Reputation: 111830

You could try:

var currentPageHits2 = ctx.Pagehits
    .GroupBy(x => x.UserID)
    .Select(x => x.OrderByDescending(y => y.TimeStamp).First())
    .OrderByDescending(x => x.TimeStamp);

But the speed should be the same.

Note that there is a subtle difference between this query and yours... With yours, if a UserId has two "max TimeStamp" PageHits with the same TimeStamp, two "rows" will be returned, with this one only one will be returned.

Upvotes: 3

Related Questions