To LINQ or not to LINQ: Regarding LINQ to Entity/SQL/Object

Question

I have a scenario which is causing very long running processes and suspect it is our usage of LINQ to Entity.

BACKGROUND: Project is using LINQ To Entity and Repository pattern to expose data to our logic layer. That is how it is and wont be changing.

ISSUE: A certain scenario has cropped up which requires the selection of intersecting data from both inputs and other tables. Trying to optimise this, I first queried the DB for an Array of IDs which I intend to use for getting the intersecting data. I also have another Array of integers to use in my LINQ expression. I then build an expression to select data from the table in question with LINQ where and contains methods. This takes a long time to execute. Almost a minute.

To battle this, I have tried several LINQ techniques that all take almost the same time. For information sake, below is a sample of some of my approaches.

// FYI: tableTotalsIds contains 14,856 IDs as an example, built from a repository call
var tableTotalsIds = tableTotals.Select(s => s.Id).ToArray();
int[] ages = {25, 26, 27};

Expression> ageFilter = 
    af => af.TableTotalsId != null &&
          tableTotalsIds.Contains(af.TableTotalsId.Value) &&
          ages.Contains(af.Age);

var directStartTime = DateTime.Now;
var directFetch = _ctx.TotalAgeCounts.Where(ageFilter).ToList();
var directBenchMark = DateTime.Now.Subtract(directStartTime).TotalSeconds;

var repositoryStartTime = DateTime.Now;
var repositoryFetch = _totalAgeCountsRepository
    .SelectAll(new Specification(ageFilter));
var repositoryBenchMark = DateTime.Now.Subtract(repositoryStartTime).TotalSeconds;

In all cases, query time takes about 1 minute. The thing that jumps out at me is the huge number of tableTotalsIds being used in the .Contains() method, but I am not aware of other LINQ ways of achieving this.

Is there a more optimised way of doing this in LINQ?

At the moment I am thinking of dropping this query back into the DB as a simple join and skip the LINQ bottle neck here. But first I will try pulling the unfiltered data into memory and then use LINQ to join the data together and see how efficient that will be.

I am interested in how others have overcome similar bottle necks without rewriting the architecture of the application.

SOLUTION

As the commenters pointed out, LINQ optimisation was not happening due to my .ToArray(). The problem went deeper, as I was using our Repository implementation to build the tableTotalsIds which already cast the result to an IList, losing further LINQ/SQL Optimisation. Simply not using our Repository implementation to build the tableTotalsIds and query the dataContext directly, leaving the results as IQueryable fixed the issue.

Andrew Barber · Accepted Answer

The fact that you are calling ToArray is causing the filtering records to be pulled out of the db, only to be injected again as part of your query. This prevents the query optimizer from making good use of the fact that it already had what it needed right there. As you noted in commemts, removing the Tolist/toarray helped.

As for the repository pattern, there's no reason you can't use it. You just don't need a separate repository for every class; only for those which are important, root objects that you will query for.

In this case,your helper table info can just be rolled up in the same query; the repository pattern does not demand you create a sepatate repository for it.

To LINQ or not to LINQ: Regarding LINQ to Entity/SQL/Object

Answers (1)

Related Questions