Beginner
Beginner

Reputation: 29533

Speeding up a linq entitiy query

I currently have a query which not takes a long time and sometimes crashes because of the amount of data in the database.

Can someone notice anything i can do to help speed it up?

public IList<Report> GetReport(CmsEntities context, long manufacturerId, long? regionId, long? vehicleTypeId)
        {
            var now = DateTime.Now;
            var today = new DateTime(now.Year, now.Month, 1);
            var date1monthago = today.AddMonths(-1);
            var date2monthago = today.AddMonths(-2);
            var date3monthago = today.AddMonths(-3);
            var date4monthago = today.AddMonths(-4);
            var date5monthago = today.AddMonths(-5);
            var date6monthago = today.AddMonths(-6);
            today = TimeManager.EndOfDay(new DateTime(now.AddMonths(-1).Year, today.AddMonths(-1).Month, DateTime.DaysInMonth(now.Year, today.AddMonths(-1).Month)));             
            var query = from item in context.Invoices
                         where item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Select(x => x.ManufacturerId).Contains(manufacturerId)
                         && (item.InvoiceDate >= date6monthago && item.InvoiceDate <= today)
                         && (regionId.HasValue && regionId.Value > 0 ? item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Select(x => x.RegionId).Contains(regionId.Value) : true)
                         && (item.InvType == "I" || item.InvType == null)
                         && (vehicleTypeId.HasValue && vehicleTypeId.Value > 0 ? item.Repair.Job.Vehicle.Model.VehicleTypes.Select(x => x.Id).Contains(vehicleTypeId.Value) : true)
                         select item;

            var query2 = from item in query
                         group item by new { item.Repair.Job.Bodyshop } into g
                         let manufJobs = query.Where(x => x.Repair.Job.Vehicle.Model.ManufacturerId == manufacturerId && x.Repair.Job.BodyshopId == g.Key.Bodyshop.Id)
                         let allJobs = query.Where(x => x.Repair.Job.BodyshopId == g.Key.Bodyshop.Id)
                         select new tReport
                         {                                     
    MonthSixManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date6monthago.Month && x.InvoiceDate.Year == date6monthago.Year).GroupBy(x => x.Repair.Job).Count(),
    MonthSixJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date6monthago.Month && x.InvoiceDate.Year == date6monthago.Year).GroupBy(x => x.Repair.Job).Count(),

    MonthFiveManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date5monthago.Month && x.InvoiceDate.Year == date5monthago.Year).GroupBy(x => x.Repair.Job).Count(),
    MonthFiveJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date5monthago.Month && x.InvoiceDate.Year == date5monthago.Year).GroupBy(x => x.Repair.Job).Count(),

    MonthFourManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date4monthago.Month && x.InvoiceDate.Year == date4monthago.Year).GroupBy(x => x.Repair.Job).Count(),
    MonthFourJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date4monthago.Month && x.InvoiceDate.Year == date4monthago.Year).GroupBy(x => x.Repair.Job).Count(),

    MonthThreeManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date3monthago.Month && x.InvoiceDate.Year == date3monthago.Year).GroupBy(x => x.Repair.Job).Count(),
    MonthThreeJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date3monthago.Month && x.InvoiceDate.Year == date3monthago.Year).GroupBy(x => x.Repair.Job).Count(),

    MonthTwoManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date2monthago.Month && x.InvoiceDate.Year == date2monthago.Year).GroupBy(x => x.Repair.Job).Count(),
    MonthTwoJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date2monthago.Month && x.InvoiceDate.Year == date2monthago.Year).GroupBy(x => x.Repair.Job).Count(),

    MonthOneManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date1monthago.Month && x.InvoiceDate.Year == date1monthago.Year).GroupBy(x => x.Repair.Job).Count(),
    MonthOneJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date1monthago.Month && x.InvoiceDate.Year == date1monthago.Year).GroupBy(x => x.Repair.Job).Count(),

    ManufTotal = manufJobs.GroupBy(x => x.Repair.Job).Count(),
    Total = allJobs.GroupBy(x => x.Repair.Job).Count(),

    PercentageOf = ((decimal)manufJobs.GroupBy(x => x.Repair.Job).Count() / (decimal)allJobs.GroupBy(x => x.Repair.Job).Count()) * 100
                         };

            return query2.OrderBy(x => x).ToList();
        }

EDIT

var query = from item in context.Invoices.AsNoTracking()
                    where item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(x => x.ManufacturerId == manufacturerId)
                    && (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today)
                    && (item.InvType == "I" || item.InvType == null)
                    select item;

        if (regionId.HasValue && regionId.Value > 0)
        {
            query = query.Where(item => item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Select(x => x.RegionId).Contains(regionId.Value));
        }

        if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
        {
            query = query.Where(item => item.Repair.Job.Vehicle.Model.VehicleTypes.Select(x => x.Id).Contains(vehicleTypeId.Value));
        }


              var query2 = from item in hey
                     group item by new { item.Repair.Job.Bodyshop, item.InvoiceDate.Month } into m
                     select new TReport
                     {
                         Bodyshop = m.Key.Bodyshop.Name,
                         Bays = m.Key.Bodyshop.Bays,
                         Region = m.Key.Bodyshop.Manufacturer2Bodyshop.FirstOrDefault(x => x.ManufacturerId == manufacturerId).Region.Name,
                         BodyshopCode = m.Key.Bodyshop.Manufacturer2Bodyshop.FirstOrDefault(x => x.ManufacturerId == manufacturerId).BodyshopCode,
                         Total = m.Count(),
                         ManufTotal = m.Where(x => x.Repair.Job.Vehicle.Model.ManufacturerId == manufacturerId).Count(),
                         Totals = m.GroupBy(j => j.InvoiceDate.Month).Select(j => new TPercentReportInner
                         {
                             Month = j.Key,
                             ManufTotal = j.Where(x => x.Repair.Job.Vehicle.Model.ManufacturerId == manufacturerId).Count(),
                             AllTotal = j.Count()
                         })
                     };

Ive cut the query down. But even this is performing now worse than before?

Upvotes: 7

Views: 696

Answers (6)

Ivan Stoev
Ivan Stoev

Reputation: 205559

Along with the elimination of the unnecessary where conditions, I'll put my bet on the group by clause.

You are including item.Repair.Job.Bodyshop as one of the grouping fields. Anytime you use something like this, EF will generate a SQL GROUP BY clause including all the fields from the corresponding table. I don't know how many columns do you have in the db table that corresponds to your Bodyshop entity, but in any case using it this way most probably will not allow creating a good SQL execution plan.

I would suggest you trying the following equivalent of your cut down query:

var query = context.Invoices.AsNoTracking().Where(item =>
    (item.InvType == "I" || item.InvType == null) &&
    (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today));

if (regionId.HasValue && regionId.Value > 0)
    query = query.Where(item =>
        item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(source =>
            source.ManufacturerId == manufacturerId && source.RegionId == regionId.Value));
else
    query = query.Where(item => 
        item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(source => 
            source.ManufacturerId == manufacturerId));

if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
    query = query.Where(item =>
        item.Repair.Job.Vehicle.Model.VehicleTypes.Any(vehicleType => 
            vehicleType.Id == vehicleTypeId.Value);

var query2 = query
    .GroupBy(item => new { Month = item.InvoiceDate.Month, BodyshopId = item.Repair.Job.Bodyshop.Id })
    .Select(g => new TReport { BodyshopId = g.Key.BodyshopId, Month = g.Key.Month, MonthAllJobTotal = g.Count() });

var result = query2.ToList();

Upvotes: 0

Robert McKee
Robert McKee

Reputation: 21487

I would start by removing the hardcoded optional conditionals from your query, which will allow the query optimizer to use different query plans based on the parameters you have, like:

var query = from item in context.Invoices.AsNoTracking()
            where item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Select(x => x.ManufacturerId).Contains(manufacturerId)
            && (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today)
            && (item.InvType == "I" || item.InvType == null)
            select item;

if (regionId.HasValue && regionId.Value > 0)
    query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Select(x => x.RegionId).Contains(regionId.Value));

if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
    query=query.Where(item=>item.Repair.Job.Vehicle.Model.VehicleTypes.Select(x => x.Id).Contains(vehicleTypeId.Value));

var query2 = from item in query
             group item by new { item.InvoiceDate.Month, item.Repair.Job.Bodyshop } into g
             select new TReport
             {
                 BodyshopId = g.Key.Bodyshop.Id,  
                 Month = g.Key.Month,
                 MonthAllJobTotal = g.Count()
             };

return query2.ToList();

You could also check to see if converting .Select(x=>x.id).Contains(id) or .Any(x=>x.Id==id) performs faster although I would think they would be similar in query plan and execution speed. That would give you:

var query = from item in context.Invoices.AsNoTracking()
            where item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.ManufacturerId==manufacturerId)
            && (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today)
            && (item.InvType == "I" || item.InvType == null)
            select item;

if (regionId.HasValue && regionId.Value > 0)
    query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.RegionId==regionId.Value));

if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
    query=query.Where(item=>item.Repair.Job.Vehicle.Model.VehicleTypes.Any(v=>v.Id==vehicleTypeId.Value));

var query2 = from item in query
             group item by new { item.InvoiceDate.Month, item.Repair.Job.Bodyshop } into g
             select new TReport
             {
                 BodyshopId = g.Key.Bodyshop.Id,  
                 Month = g.Key.Month,
                 MonthAllJobTotal = g.Count()
             };

return query2.ToList();

Based on what you have, I would guess the .AsNoTracking() is doing very little for you, but it couldn't hurt. It has a bigger effect when retrieving large numbers of entities which this doesn't appear to be doing.

I would then clean up and standardize your query by removing the hardcoded ManufacturerId as well, which would give you:

var query = from item in context.Invoices.AsNoTracking()
            where (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today)
            && (item.InvType == "I" || item.InvType == null)
            select item;

if (manufacturerId.HasValue && manufacturerId.Value > 0)
    query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.ManufacturerId==manufacturerId));

if (regionId.HasValue && regionId.Value > 0)
    query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.RegionId==regionId.Value));

if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
    query=query.Where(item=>item.Repair.Job.Vehicle.Model.VehicleTypes.Any(v=>v.Id==vehicleTypeId.Value));

var query2 = from item in query
             group item by new { item.InvoiceDate.Month, item.Repair.Job.Bodyshop } into g
             select new TReport
             {
                 BodyshopId = g.Key.Bodyshop.Id,  
                 Month = g.Key.Month,
                 MonthAllJobTotal = g.Count()
             };

return query2.ToList();

and then lastly, I would return an IQueryable instead of List so that if you don't need one or more columns they can be dropped from the final query as well like:

public IQueryable<Report> GetReport(CmsEntities context, long? manufacturerId, long? regionId, long? vehicleTypeId)
    {
{
        var now = DateTime.Now;
        var today = new DateTime(now.Year, now.Month, 1);
        var date1monthago = today.AddMonths(-1);
        var date2monthago = today.AddMonths(-2);
        var date3monthago = today.AddMonths(-3);
        var date4monthago = today.AddMonths(-4);
        var date5monthago = today.AddMonths(-5);
        var date6monthago = today.AddMonths(-6);
        today = TimeManager.EndOfDay(new DateTime(now.AddMonths(-1).Year, today.AddMonths(-1).Month, DateTime.DaysInMonth(now.Year, today.AddMonths(-1).Month)));             

    var query = from item in context.Invoices.AsNoTracking()
                where (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today)
                && (item.InvType == "I" || item.InvType == null)
                select item;

    if (manufacturerId.HasValue && manufacturerId.Value > 0)
        query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.ManufacturerId==manufacturerId));

    if (regionId.HasValue && regionId.Value > 0)
        query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.RegionId==regionId.Value));

    if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
        query=query.Where(item=>item.Repair.Job.Vehicle.Model.VehicleTypes.Any(v=>v.Id==vehicleTypeId.Value));

    var query2 = from item in query
                 group item by new { item.InvoiceDate.Month, item.Repair.Job.Bodyshop } into g
                 select new TReport
                 {
                     BodyshopId = g.Key.Bodyshop.Id,  
                     Month = g.Key.Month,
                     MonthAllJobTotal = g.Count()
                 };

    return query2;
}

Then I would break these apart and convert these to extension methods:

public static class MyExtensions 
{
    public static IQueryable<Invoice> Recent(this IQueryable<Invoice> context,long? manufacturerId=null,long? regionId=null,long? vehicleId=null)
    {
        var now = DateTime.Now;
        var today = new DateTime(now.Year, now.Month, 1);
        var date1monthago = today.AddMonths(-1);
        var date2monthago = today.AddMonths(-2);
        var date3monthago = today.AddMonths(-3);
        var date4monthago = today.AddMonths(-4);
        var date5monthago = today.AddMonths(-5);
        var date6monthago = today.AddMonths(-6);
        today = TimeManager.EndOfDay(new DateTime(now.AddMonths(-1).Year, today.AddMonths(-1).Month, DateTime.DaysInMonth(now.Year, today.AddMonths(-1).Month)));             

        var query = from item in context.Invoices.AsNoTracking()
                where (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today)
                && (item.InvType == "I" || item.InvType == null)
                select item;

        if (manufacturerId.HasValue && manufacturerId.Value > 0)
            query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.ManufacturerId==manufacturerId));

        if (regionId.HasValue && regionId.Value > 0)
            query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.RegionId==regionId.Value));

        if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
            query=query.Where(item=>item.Repair.Job.Vehicle.Model.VehicleTypes.Any(v=>v.Id==vehicleTypeId.Value));
    return query;
}
public static IQueryable<Report> ToReport(this IQueryable<Invoice> context)
{
    return (from item in query
                 group item by new { item.InvoiceDate.Month, item.Repair.Job.Bodyshop } into g
                 select new TReport
                 {
                     BodyshopId = g.Key.Bodyshop.Id,  
                     Month = g.Key.Month,
                     MonthAllJobTotal = g.Count()
                 });

}
}

Now you can do the following:

var reports=db.Invoices.Recent.ToReport(); 

or

var reports=db.Invoices.Recent(ManufacturerEnum.Toyota).ToReport();

Upvotes: 5

sm14
sm14

Reputation: 119

I would suggest that you break it down into smaller bit size parts. Get the initial data into separate lists and then maybe sew them together.

You can get a lot of milage into ToDictionary and ToLookup extensions.

I bet if you took a look at the SQL you are sending to the database it would be insane.

Upvotes: 0

jrivor
jrivor

Reputation: 79

(Note: I'm basing this on the queries in your edit)

There are a couple things you could potentially try. First, you could simplify the use of .Select() with .Contains() to use .Any() instead. This could result in a simpler query generated by EF, although it's hard to tell without using SQL Server Profiler or Visual Studio Debugger to see the actual SQL query. For example, change .Select(x => x.ManufacturerId).Contains(manufacturerId) to .Any(x => x.ManufacturerId == manufacturerId).

The second thing to try is to execute the first query against the SQL Server and process the second query in the application. EF defers execution of a query until enumeration of the result occurs (e.g. .ToList() or foreach). So you could try (from item in context ... select item).ToList() on the first query which will cause the second query to execute in the application instead of on the SQL Server. This would help if the group by in the second query causes performance degradation if done by SQL Server instead of in the application itself.

However, if you try the second suggestion there may be a negative effect caused by the item.Repair.Job.Bodyshop grouping if it is a virtual navigation property because EF will have to get that object separately (as opposed to all in one query). This can be mitigated by changing the first query to context.Invoices.AsNoTracking().Include("Repair.Job") or context.Invoices.AsNoTracking().Include(x => x.Repair.Job) (the second option is not available older EF versions), and changing the second query to group item by new { item.InvoiceDate.Month, item.Repair.Job.BodyshopId }.

Upvotes: 0

Fab
Fab

Reputation: 14813

First, some general SQL optimization tips:

Before trying to do optimization, you should always profile. Profiling has the combined advantage of giving you objective description or where you stand in terms of performance and giving hints of where you should start your optimization effort. And as a free added benefit, at the end of the day, you can justify your hard work to your management with clear objective numbers (or even performance graphs for powerpoint lovers).

As some suggests, you could try to optimize in two different ways:

  • Optimize SQL plan (can be a time consuming task)

    1. Figure out what is the query in SQL.
    2. Run the query in SQL management studio or similar tool to define the best possible execution plan ( You may need to add indexes in the process). You should know from there if the optimization would be sufficient to match your performance criteria or not.
    3. Study current execution plan to check whether current query use most optimal indexes/keys, joins.
    4. Either modify code or use a intermediary SQL object (stored procedure or view) to make the linq to sql code use the optimized execution plan.
  • Optimize the way code process data (less time consuming task)

    1. Caching (this can be in several ways, but the principle is to fetch the data you need and process it in the code). It is especially efficient when the query optimizer is not able to manage efficiently your query (old database version / engine not so good).
    2. Compile Link to SQL query. Joe Albahari explains better than me.

Which approach you may favour first depends on merely on what performance gains you expect from it and what results you got from your profiling session.

By the look of the complexity of your linq to sql code, chances are that the generated sql query is sub optimal. On the other hand, linq in memory object management is quite fast compared to I/O with a remote SQL db.

Note that if you don't cache queries, it will fetch data each time. Caching could be done like that :

var allJobs = query.Where(x => x.Repair.Job.BodyshopId == g.Key.Bodyshop.Id).ToArray();

As a general rule of thumb, Keep It Simple Stupid and your query is not. You may refactor it a little as it seems there is quite redundant code:

MonthSixManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date6monthago.Month && x.InvoiceDate.Year == date6monthago.Year).GroupBy(x => x.Repair.Job).Count(),
    MonthSixJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date6monthago.Month && x.InvoiceDate.Year == date6monthago.Year).GroupBy(x => x.Repair.Job).Count(),

There is a quite widespread belief that by grouping all code in the same method (in a similar way as assembler code), you limit the number of method calls and then perform better in terms of performance. However the truth is counter-intuitive. Using the divide and conquer rule, a method chunked code will actually perform better in the end because it will be easier to maintain, to optimize, to enhance, to refactor. It was the whole point of software evolution from assembler to C#.

Upvotes: 3

Ferm&#237;n
Ferm&#237;n

Reputation: 745

You could implementen paging to avoid materialize all results. I mean, you could implement Skip and Take linq methods.

Simple example based on your code:

public IList<Report> GetReport(CmsEntities context, long manufacturerId, long? regionId, long? vehicleTypeId, int pageSize, int currentPage)
        {

        //Code removed to simplify

        return  query2.Skip(pageSize * currentPage).Take(pageSize );

        }

Upvotes: 3

Related Questions