Jules
Jules

Reputation: 7223

Why is Entity Framework having performance issues when calculating a sum

I am using Entity Framework in a C# application and I am using lazy loading. I am experiencing performance issues when calculating the sum of a property in a collection of elements. Let me illustrate it with a simplified version of my code:

public decimal GetPortfolioValue(Guid portfolioId) {

    var portfolio = DbContext.Portfolios.FirstOrDefault( x => x.Id.Equals( portfolioId ) );
    if (portfolio == null) return 0m;

    return portfolio.Items
        .Where( i =>
            i.Status == ItemStatus.Listed
            &&
            _activateStatuses.Contains( i.Category.Status )
        )
        .Sum( i => i.Amount );
} 

So I want to fetch the value for all my items that have a certain status of which their parent has a specific status as well.

When logging the queries generated by EF I see it is first fetching my Portfolio (which is fine). Then it does a query to load all Item entities that are part of this portfolio. And then it starts fetching ALL Category entities for each Item one by one. So if I have a portfolio that contains 100 items (each with a category), it literally does 100 SELECT ... FROM categories WHERE id = ... queries.

So it seems like it's just fetching all info, storing it in its memory and then calculating the sum. Why does it not do a simple join between my tables and calculate it like that?

Instead of doing 102 queries to calculate the sum of 100 items I would expect something along the lines of:

SELECT
    i.id, i.amount 
FROM
    items i 
    INNER JOIN categories c ON c.id = i.category_id
WHERE
    i.portfolio_id = @portfolioId
    AND
    i.status = 'listed'
    AND
    c.status IN ('active', 'pending', ...);

on which it could then calculate the sum (if it is not able to use the SUM directly in the query).

What is the problem and how can I improve the performance other than writing a pure ADO query instead of using Entity Framework?

To be complete, here are my EF entities:

public class ItemConfiguration : EntityTypeConfiguration<Item> {
   ToTable("items");
   ...
   HasRequired(p => p.Portfolio);
}

public class CategoryConfiguration : EntityTypeConfiguration<Category> {
    ToTable("categories");
    ...
    HasMany(c => c.Products).WithRequired(p => p.Category);
}

EDIT based on comments:

I didn't think it was important but the _activeStatuses is a list of enums.

private CategoryStatus[] _activeStatuses = new[] { CategoryStatus.Active, ... };

But probably more important is that I left out that the status in the database is a string ("active", "pending", ...) but I map them to an enum used in the application. And that is probably why EF cannot evaluate it? The actual code is:

... && _activateStatuses.Contains(CategoryStatusMapper.MapToEnum(i.Category.Status)) ...

EDIT2

Indeed the mapping is a big part of the problem but the query itself seems to be the biggest issue. Why is the performance difference so big between these two queries?

// Slow query
var portfolio = DbContext.Portfolios.FirstOrDefault(p => p.Id.Equals(portfolioId));
var value = portfolio.Items.Where(i => i.Status == ItemStatusConstants.Listed && 
                _activeStatuses.Contains(i.Category.Status))
                .Select(i => i.Amount).Sum();

// Fast query
var value = DbContext.Portfolios.Where(p => p.Id.Equals(portfolioId))
                .SelectMany(p => p.Items.Where(i => 
                    i.Status == ItemStatusConstants.Listed &&
                    _activeStatuses.Contains(i.Category.Status)))
                    .Select(i => i.Amount).Sum();

The first query does a LOT of small SQL queries whereas the second one just combines everything into one bigger query. I'd expect even the first query to run one query to get the portfolio value.

Upvotes: 1

Views: 400

Answers (2)

phuzi
phuzi

Reputation: 13059

Yep CategoryStatusMapper.MapToEnum cannot be converted to SQL, forcing it to run the Where in .Net. Rather than mapping the status to the enum, _activeStatuses should contain the list of integer values from the enum so the mapping is not required.

private int[] _activeStatuses = new[] { (int)CategoryStatus.Active, ... };

So that the contains becomes

... && _activateStatuses.Contains(i.Category.Status) ...

and can all be converted to SQL

UPDATE

Given that i.Category.Status is a string in the database, then

private string[] _activeStatuses = new[] { CategoryStatus.Active.ToString(), ... };

Upvotes: 1

Igor
Igor

Reputation: 62238

Calling portfolio.Items this will lazy load the collection in Items and then execute the subsequent calls including the Where and Sum expressions. See also Loading Related Entities article.

You need to execute the call directly on the DbContext the Sum expression can be evaluated database server side.

var portfolio = DbContext.Portfolios
    .Where(x => x.Id.Equals(portfolioId))
    .SelectMany(x => x.Items.Where(i => i.Status == ItemStatus.Listed && _activateStatuses.Contains( i.Category.Status )).Select(i => i.Amount))
    .Sum();

You also have to use the appropriate type for _activateStatuses instance as the contained values must match the type persisted in the database. If the database persists string values then you need to pass a list of string values.

var _activateStatuses = new string[] {"Active", "etc"};

You could use a Linq expression to convert enums to their string representative.


Notes

  • I would recommend you turn off lazy loading on your DbContext type. As soon as you do that you will start to catch issues like this at run time via Exceptions and can then write more performant code.
  • I did not include error checking for if no portfolio was found but you could extend this code accordingly.

Upvotes: 1

Related Questions