Reputation: 29533
I currently have a query which not takes a long time and sometimes crashes because of the amount of data in the database.
Can someone notice anything i can do to help speed it up?
public IList<Report> GetReport(CmsEntities context, long manufacturerId, long? regionId, long? vehicleTypeId)
{
var now = DateTime.Now;
var today = new DateTime(now.Year, now.Month, 1);
var date1monthago = today.AddMonths(-1);
var date2monthago = today.AddMonths(-2);
var date3monthago = today.AddMonths(-3);
var date4monthago = today.AddMonths(-4);
var date5monthago = today.AddMonths(-5);
var date6monthago = today.AddMonths(-6);
today = TimeManager.EndOfDay(new DateTime(now.AddMonths(-1).Year, today.AddMonths(-1).Month, DateTime.DaysInMonth(now.Year, today.AddMonths(-1).Month)));
var query = from item in context.Invoices
where item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Select(x => x.ManufacturerId).Contains(manufacturerId)
&& (item.InvoiceDate >= date6monthago && item.InvoiceDate <= today)
&& (regionId.HasValue && regionId.Value > 0 ? item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Select(x => x.RegionId).Contains(regionId.Value) : true)
&& (item.InvType == "I" || item.InvType == null)
&& (vehicleTypeId.HasValue && vehicleTypeId.Value > 0 ? item.Repair.Job.Vehicle.Model.VehicleTypes.Select(x => x.Id).Contains(vehicleTypeId.Value) : true)
select item;
var query2 = from item in query
group item by new { item.Repair.Job.Bodyshop } into g
let manufJobs = query.Where(x => x.Repair.Job.Vehicle.Model.ManufacturerId == manufacturerId && x.Repair.Job.BodyshopId == g.Key.Bodyshop.Id)
let allJobs = query.Where(x => x.Repair.Job.BodyshopId == g.Key.Bodyshop.Id)
select new tReport
{
MonthSixManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date6monthago.Month && x.InvoiceDate.Year == date6monthago.Year).GroupBy(x => x.Repair.Job).Count(),
MonthSixJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date6monthago.Month && x.InvoiceDate.Year == date6monthago.Year).GroupBy(x => x.Repair.Job).Count(),
MonthFiveManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date5monthago.Month && x.InvoiceDate.Year == date5monthago.Year).GroupBy(x => x.Repair.Job).Count(),
MonthFiveJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date5monthago.Month && x.InvoiceDate.Year == date5monthago.Year).GroupBy(x => x.Repair.Job).Count(),
MonthFourManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date4monthago.Month && x.InvoiceDate.Year == date4monthago.Year).GroupBy(x => x.Repair.Job).Count(),
MonthFourJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date4monthago.Month && x.InvoiceDate.Year == date4monthago.Year).GroupBy(x => x.Repair.Job).Count(),
MonthThreeManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date3monthago.Month && x.InvoiceDate.Year == date3monthago.Year).GroupBy(x => x.Repair.Job).Count(),
MonthThreeJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date3monthago.Month && x.InvoiceDate.Year == date3monthago.Year).GroupBy(x => x.Repair.Job).Count(),
MonthTwoManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date2monthago.Month && x.InvoiceDate.Year == date2monthago.Year).GroupBy(x => x.Repair.Job).Count(),
MonthTwoJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date2monthago.Month && x.InvoiceDate.Year == date2monthago.Year).GroupBy(x => x.Repair.Job).Count(),
MonthOneManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date1monthago.Month && x.InvoiceDate.Year == date1monthago.Year).GroupBy(x => x.Repair.Job).Count(),
MonthOneJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date1monthago.Month && x.InvoiceDate.Year == date1monthago.Year).GroupBy(x => x.Repair.Job).Count(),
ManufTotal = manufJobs.GroupBy(x => x.Repair.Job).Count(),
Total = allJobs.GroupBy(x => x.Repair.Job).Count(),
PercentageOf = ((decimal)manufJobs.GroupBy(x => x.Repair.Job).Count() / (decimal)allJobs.GroupBy(x => x.Repair.Job).Count()) * 100
};
return query2.OrderBy(x => x).ToList();
}
EDIT
var query = from item in context.Invoices.AsNoTracking()
where item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(x => x.ManufacturerId == manufacturerId)
&& (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today)
&& (item.InvType == "I" || item.InvType == null)
select item;
if (regionId.HasValue && regionId.Value > 0)
{
query = query.Where(item => item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Select(x => x.RegionId).Contains(regionId.Value));
}
if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
{
query = query.Where(item => item.Repair.Job.Vehicle.Model.VehicleTypes.Select(x => x.Id).Contains(vehicleTypeId.Value));
}
var query2 = from item in hey
group item by new { item.Repair.Job.Bodyshop, item.InvoiceDate.Month } into m
select new TReport
{
Bodyshop = m.Key.Bodyshop.Name,
Bays = m.Key.Bodyshop.Bays,
Region = m.Key.Bodyshop.Manufacturer2Bodyshop.FirstOrDefault(x => x.ManufacturerId == manufacturerId).Region.Name,
BodyshopCode = m.Key.Bodyshop.Manufacturer2Bodyshop.FirstOrDefault(x => x.ManufacturerId == manufacturerId).BodyshopCode,
Total = m.Count(),
ManufTotal = m.Where(x => x.Repair.Job.Vehicle.Model.ManufacturerId == manufacturerId).Count(),
Totals = m.GroupBy(j => j.InvoiceDate.Month).Select(j => new TPercentReportInner
{
Month = j.Key,
ManufTotal = j.Where(x => x.Repair.Job.Vehicle.Model.ManufacturerId == manufacturerId).Count(),
AllTotal = j.Count()
})
};
Ive cut the query down. But even this is performing now worse than before?
Upvotes: 7
Views: 696
Reputation: 205559
Along with the elimination of the unnecessary where
conditions, I'll put my bet on the group by
clause.
You are including item.Repair.Job.Bodyshop
as one of the grouping fields.
Anytime you use something like this, EF will generate a SQL GROUP BY
clause including all the fields from the corresponding table. I don't know how many columns do you have in the db table that corresponds to your Bodyshop
entity, but in any case using it this way most probably will not allow creating a good SQL execution plan.
I would suggest you trying the following equivalent of your cut down query:
var query = context.Invoices.AsNoTracking().Where(item =>
(item.InvType == "I" || item.InvType == null) &&
(item.InvoiceDate >= date12monthago && item.InvoiceDate <= today));
if (regionId.HasValue && regionId.Value > 0)
query = query.Where(item =>
item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(source =>
source.ManufacturerId == manufacturerId && source.RegionId == regionId.Value));
else
query = query.Where(item =>
item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(source =>
source.ManufacturerId == manufacturerId));
if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
query = query.Where(item =>
item.Repair.Job.Vehicle.Model.VehicleTypes.Any(vehicleType =>
vehicleType.Id == vehicleTypeId.Value);
var query2 = query
.GroupBy(item => new { Month = item.InvoiceDate.Month, BodyshopId = item.Repair.Job.Bodyshop.Id })
.Select(g => new TReport { BodyshopId = g.Key.BodyshopId, Month = g.Key.Month, MonthAllJobTotal = g.Count() });
var result = query2.ToList();
Upvotes: 0
Reputation: 21487
I would start by removing the hardcoded optional conditionals from your query, which will allow the query optimizer to use different query plans based on the parameters you have, like:
var query = from item in context.Invoices.AsNoTracking()
where item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Select(x => x.ManufacturerId).Contains(manufacturerId)
&& (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today)
&& (item.InvType == "I" || item.InvType == null)
select item;
if (regionId.HasValue && regionId.Value > 0)
query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Select(x => x.RegionId).Contains(regionId.Value));
if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
query=query.Where(item=>item.Repair.Job.Vehicle.Model.VehicleTypes.Select(x => x.Id).Contains(vehicleTypeId.Value));
var query2 = from item in query
group item by new { item.InvoiceDate.Month, item.Repair.Job.Bodyshop } into g
select new TReport
{
BodyshopId = g.Key.Bodyshop.Id,
Month = g.Key.Month,
MonthAllJobTotal = g.Count()
};
return query2.ToList();
You could also check to see if converting .Select(x=>x.id).Contains(id)
or .Any(x=>x.Id==id)
performs faster although I would think they would be similar in query plan and execution speed. That would give you:
var query = from item in context.Invoices.AsNoTracking()
where item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.ManufacturerId==manufacturerId)
&& (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today)
&& (item.InvType == "I" || item.InvType == null)
select item;
if (regionId.HasValue && regionId.Value > 0)
query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.RegionId==regionId.Value));
if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
query=query.Where(item=>item.Repair.Job.Vehicle.Model.VehicleTypes.Any(v=>v.Id==vehicleTypeId.Value));
var query2 = from item in query
group item by new { item.InvoiceDate.Month, item.Repair.Job.Bodyshop } into g
select new TReport
{
BodyshopId = g.Key.Bodyshop.Id,
Month = g.Key.Month,
MonthAllJobTotal = g.Count()
};
return query2.ToList();
Based on what you have, I would guess the .AsNoTracking()
is doing very little for you, but it couldn't hurt. It has a bigger effect when retrieving large numbers of entities which this doesn't appear to be doing.
I would then clean up and standardize your query by removing the hardcoded ManufacturerId as well, which would give you:
var query = from item in context.Invoices.AsNoTracking()
where (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today)
&& (item.InvType == "I" || item.InvType == null)
select item;
if (manufacturerId.HasValue && manufacturerId.Value > 0)
query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.ManufacturerId==manufacturerId));
if (regionId.HasValue && regionId.Value > 0)
query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.RegionId==regionId.Value));
if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
query=query.Where(item=>item.Repair.Job.Vehicle.Model.VehicleTypes.Any(v=>v.Id==vehicleTypeId.Value));
var query2 = from item in query
group item by new { item.InvoiceDate.Month, item.Repair.Job.Bodyshop } into g
select new TReport
{
BodyshopId = g.Key.Bodyshop.Id,
Month = g.Key.Month,
MonthAllJobTotal = g.Count()
};
return query2.ToList();
and then lastly, I would return an IQueryable instead of List so that if you don't need one or more columns they can be dropped from the final query as well like:
public IQueryable<Report> GetReport(CmsEntities context, long? manufacturerId, long? regionId, long? vehicleTypeId)
{
{
var now = DateTime.Now;
var today = new DateTime(now.Year, now.Month, 1);
var date1monthago = today.AddMonths(-1);
var date2monthago = today.AddMonths(-2);
var date3monthago = today.AddMonths(-3);
var date4monthago = today.AddMonths(-4);
var date5monthago = today.AddMonths(-5);
var date6monthago = today.AddMonths(-6);
today = TimeManager.EndOfDay(new DateTime(now.AddMonths(-1).Year, today.AddMonths(-1).Month, DateTime.DaysInMonth(now.Year, today.AddMonths(-1).Month)));
var query = from item in context.Invoices.AsNoTracking()
where (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today)
&& (item.InvType == "I" || item.InvType == null)
select item;
if (manufacturerId.HasValue && manufacturerId.Value > 0)
query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.ManufacturerId==manufacturerId));
if (regionId.HasValue && regionId.Value > 0)
query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.RegionId==regionId.Value));
if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
query=query.Where(item=>item.Repair.Job.Vehicle.Model.VehicleTypes.Any(v=>v.Id==vehicleTypeId.Value));
var query2 = from item in query
group item by new { item.InvoiceDate.Month, item.Repair.Job.Bodyshop } into g
select new TReport
{
BodyshopId = g.Key.Bodyshop.Id,
Month = g.Key.Month,
MonthAllJobTotal = g.Count()
};
return query2;
}
Then I would break these apart and convert these to extension methods:
public static class MyExtensions
{
public static IQueryable<Invoice> Recent(this IQueryable<Invoice> context,long? manufacturerId=null,long? regionId=null,long? vehicleId=null)
{
var now = DateTime.Now;
var today = new DateTime(now.Year, now.Month, 1);
var date1monthago = today.AddMonths(-1);
var date2monthago = today.AddMonths(-2);
var date3monthago = today.AddMonths(-3);
var date4monthago = today.AddMonths(-4);
var date5monthago = today.AddMonths(-5);
var date6monthago = today.AddMonths(-6);
today = TimeManager.EndOfDay(new DateTime(now.AddMonths(-1).Year, today.AddMonths(-1).Month, DateTime.DaysInMonth(now.Year, today.AddMonths(-1).Month)));
var query = from item in context.Invoices.AsNoTracking()
where (item.InvoiceDate >= date12monthago && item.InvoiceDate <= today)
&& (item.InvType == "I" || item.InvType == null)
select item;
if (manufacturerId.HasValue && manufacturerId.Value > 0)
query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.ManufacturerId==manufacturerId));
if (regionId.HasValue && regionId.Value > 0)
query=query.Where(item=>item.Repair.Job.Bodyshop.Manufacturer2Bodyshop.Any(m=>m.RegionId==regionId.Value));
if (vehicleTypeId.HasValue && vehicleTypeId.Value > 0)
query=query.Where(item=>item.Repair.Job.Vehicle.Model.VehicleTypes.Any(v=>v.Id==vehicleTypeId.Value));
return query;
}
public static IQueryable<Report> ToReport(this IQueryable<Invoice> context)
{
return (from item in query
group item by new { item.InvoiceDate.Month, item.Repair.Job.Bodyshop } into g
select new TReport
{
BodyshopId = g.Key.Bodyshop.Id,
Month = g.Key.Month,
MonthAllJobTotal = g.Count()
});
}
}
Now you can do the following:
var reports=db.Invoices.Recent.ToReport();
or
var reports=db.Invoices.Recent(ManufacturerEnum.Toyota).ToReport();
Upvotes: 5
Reputation: 119
I would suggest that you break it down into smaller bit size parts. Get the initial data into separate lists and then maybe sew them together.
You can get a lot of milage into ToDictionary and ToLookup extensions.
I bet if you took a look at the SQL you are sending to the database it would be insane.
Upvotes: 0
Reputation: 79
(Note: I'm basing this on the queries in your edit)
There are a couple things you could potentially try. First, you could simplify the use of .Select()
with .Contains()
to use .Any()
instead. This could result in a simpler query generated by EF, although it's hard to tell without using SQL Server Profiler or Visual Studio Debugger to see the actual SQL query. For example, change .Select(x => x.ManufacturerId).Contains(manufacturerId)
to
.Any(x => x.ManufacturerId == manufacturerId)
.
The second thing to try is to execute the first query against the SQL Server and process the second query in the application. EF defers execution of a query until enumeration of the result occurs (e.g. .ToList()
or foreach
). So you could try (from item in context ... select item).ToList()
on the first query which will cause the second query to execute in the application instead of on the SQL Server. This would help if the group by
in the second query causes performance degradation if done by SQL Server instead of in the application itself.
However, if you try the second suggestion there may be a negative effect caused by the item.Repair.Job.Bodyshop
grouping if it is a virtual
navigation property because EF will have to get that object separately (as opposed to all in one query). This can be mitigated by changing the first query to context.Invoices.AsNoTracking().Include("Repair.Job")
or context.Invoices.AsNoTracking().Include(x => x.Repair.Job)
(the second option is not available older EF versions), and changing the second query to group item by new { item.InvoiceDate.Month, item.Repair.Job.BodyshopId }
.
Upvotes: 0
Reputation: 14813
First, some general SQL optimization tips:
Before trying to do optimization, you should always profile. Profiling has the combined advantage of giving you objective description or where you stand in terms of performance and giving hints of where you should start your optimization effort. And as a free added benefit, at the end of the day, you can justify your hard work to your management with clear objective numbers (or even performance graphs for powerpoint lovers).
As some suggests, you could try to optimize in two different ways:
Optimize SQL plan (can be a time consuming task)
Optimize the way code process data (less time consuming task)
Which approach you may favour first depends on merely on what performance gains you expect from it and what results you got from your profiling session.
By the look of the complexity of your linq to sql code, chances are that the generated sql query is sub optimal. On the other hand, linq in memory object management is quite fast compared to I/O with a remote SQL db.
Note that if you don't cache queries, it will fetch data each time. Caching could be done like that :
var allJobs = query.Where(x => x.Repair.Job.BodyshopId == g.Key.Bodyshop.Id).ToArray();
As a general rule of thumb, Keep It Simple Stupid and your query is not. You may refactor it a little as it seems there is quite redundant code:
MonthSixManufJobTotal = manufJobs.Where(x => x.InvoiceDate.Month == date6monthago.Month && x.InvoiceDate.Year == date6monthago.Year).GroupBy(x => x.Repair.Job).Count(),
MonthSixJobTotal = allJobs.Where(x => x.InvoiceDate.Month == date6monthago.Month && x.InvoiceDate.Year == date6monthago.Year).GroupBy(x => x.Repair.Job).Count(),
There is a quite widespread belief that by grouping all code in the same method (in a similar way as assembler code), you limit the number of method calls and then perform better in terms of performance. However the truth is counter-intuitive. Using the divide and conquer rule, a method chunked code will actually perform better in the end because it will be easier to maintain, to optimize, to enhance, to refactor. It was the whole point of software evolution from assembler to C#.
Upvotes: 3
Reputation: 745
You could implementen paging to avoid materialize all results. I mean, you could implement Skip and Take linq methods.
Simple example based on your code:
public IList<Report> GetReport(CmsEntities context, long manufacturerId, long? regionId, long? vehicleTypeId, int pageSize, int currentPage)
{
//Code removed to simplify
return query2.Skip(pageSize * currentPage).Take(pageSize );
}
Upvotes: 3