mrówa
mrówa

Reputation: 5771

Why running generic aggregation method is many times slower?

I'm having performance problem with Entity Framework while using generic methods for data aggregation. When querying max value of (indexed) Id column of table with few hundred thousands of rows using generic methods I'm getting huge performance drop. I'm using code-generated int keys instead of sql identities, this code is used to get next new Id.

Here's sample illustration. MaxTyped uses Max on db.Posts (which is DbSet<Post>), while MaxGeneric uses generic method max to do the same.

static int MaxTyped()
{
    using (var db = new BloggingContext())
    {                    
        return db.Posts.Max(p => p.PostId);
    }
}

static int MaxGeneric()
{
    using (var db = new BloggingContext())
    {
        return max(db.Posts, p => p.PostId);
    }
}

static int max<T>(DbSet<T> set, Func<T, int> func) where T : class
{
    // intellisense says its IEnumerable.Max
    return set.Max(func);
}

On my not so old home desktop with quite fast hdd, MaxTyped on 100k rows runs in 0.5s, while MaxGeneric around 6.5s. This is one order of magnitude slower.

On my office old testing server we're having times of few seconds vs almost 10 minutes.

The only trace of the problem I've found is the difference in Intellisense's output for Max method in both cases: in MaxTyped it identifies method as IQueryable.Max, while in max used by MaxGeneric it says its IEnumerable.Max, what might mean EF is doing Max on all loaded entities and not in database. I tried casting set to IQueryable<T>, but nothing changed. Why there's such a difference? How to fix it?

Upvotes: 0

Views: 132

Answers (1)

xanatos
xanatos

Reputation: 111910

Because the MaxTyped is executed "server side", so only a single row is returned, while MaxGeneric is executed "client side", so all the rows of the table are returned, "deserialized" to int and the "max(ed)" (this last step is the fastest one)

The real cost is in the passage of data between the SQL Server and the .NET application: less data has to be passed, the better it's.

Now try

static int max<T>(DbSet<T> set, Expression<Func<T, int>> func) where T : class
{
    // intellisense says its IEnumerable.Max
    return set.Max(func);
}

and see if set.Max is using the IEnumerable<T>.Max() or the IQueriable<T>.Max(). If it's using the second one, its speed should be around 0.5 secs.

The difference is that IQueriable<T>.Max() accepts only Expression<Func<T, int>>, while IEnumerable<T>.Max() accepts Func<T, int>, and when you write p => p.PostId this can be an Expression<Func<>> or a Func<> depending on the type of variable it's put in.

Upvotes: 3

Related Questions