Reputation: 303
I need to fetch a lot of records from a SQL Server database with EF6. The problem that its takes a lot of time. The main problem is entity called Series
which contains Measurements
. There is like 250K of them and each has 2 nested entities called FrontDropPhoto
and SideDropPhoto
.
[Table("Series")]
public class DbSeries
{
[Key] public Guid SeriesId { get; set; }
public List<DbMeasurement> MeasurementsSeries { get; set; }
}
[Table("Measurements")]
public class DbMeasurement
{
[Key] public Guid MeasurementId { get; set; }
public Guid CurrentSeriesId { get; set; }
public DbSeries CurrentSeries { get; set; }
public Guid? SideDropPhotoId { get; set; }
[ForeignKey("SideDropPhotoId")]
public virtual DbDropPhoto SideDropPhoto { get; set; }
public Guid? FrontDropPhotoId { get; set; }
[ForeignKey("FrontDropPhotoId")]
public virtual DbDropPhoto FrontDropPhoto { get; set; }
}
[Table("DropPhotos")]
public class DbDropPhoto
{
[Key] public Guid PhotoId { get; set; }
}
I've wrote fetch method like this (Most of the properties omitted for clarity):
public async Task<List<DbSeries>> GetSeriesByUserId(Guid dbUserId)
{
using (var context = new DDropContext())
{
try
{
var loadedSeries = await context.Series
.Where(x => x.CurrentUserId == dbUserId)
.Select(x => new
{
x.SeriesId,
}).ToListAsync();
var dbSeries = new List<DbSeries>();
foreach (var series in loadedSeries)
{
var seriesToAdd = new DbSeries
{
SeriesId = series.SeriesId,
};
seriesToAdd.MeasurementsSeries = await GetMeasurements(seriesToAdd);
dbSeries.Add(seriesToAdd);
}
return dbSeries;
}
catch (SqlException e)
{
throw new TimeoutException(e.Message, e);
}
}
}
public async Task<List<DbMeasurement>> GetMeasurements(DbSeries series)
{
using (var context = new DDropContext())
{
var measurementForSeries = await context.Measurements.Where(x => x.CurrentSeriesId == series.SeriesId)
.Select(x => new
{
x.CurrentSeries,
x.CurrentSeriesId,
x.MeasurementId,
})
.ToListAsync();
var dbMeasurementsForAdd = new List<DbMeasurement>();
foreach (var measurement in measurementForSeries)
{
var measurementToAdd = new DbMeasurement
{
CurrentSeries = series,
MeasurementId = measurement.MeasurementId,
FrontDropPhotoId = measurement.FrontDropPhotoId,
FrontDropPhoto = measurement.FrontDropPhotoId.HasValue
? await GetDbDropPhotoById(measurement.FrontDropPhotoId.Value)
: null,
SideDropPhotoId = measurement.SideDropPhotoId,
SideDropPhoto = measurement.SideDropPhotoId.HasValue
? await GetDbDropPhotoById(measurement.SideDropPhotoId.Value)
: null,
};
dbMeasurementsForAdd.Add(measurementToAdd);
}
return dbMeasurementsForAdd;
}
}
private async Task<DbDropPhoto> GetDbDropPhotoById(Guid photoId)
{
using (var context = new DDropContext())
{
var dropPhoto = await context.DropPhotos
.Where(x => x.PhotoId == photoId)
.Select(x => new
{
x.PhotoId,
}).FirstOrDefaultAsync();
if (dropPhoto == null)
{
return null;
}
var dbDropPhoto = new DbDropPhoto
{
PhotoId = dropPhoto.PhotoId,
};
return dbDropPhoto;
}
}
Relationships configured via FluentAPI:
modelBuilder.Entity<DbSeries>()
.HasMany(s => s.MeasurementsSeries)
.WithRequired(g => g.CurrentSeries)
.HasForeignKey(s => s.CurrentSeriesId)
.WillCascadeOnDelete();
modelBuilder.Entity<DbMeasurement>()
.HasOptional(c => c.FrontDropPhoto)
.WithMany()
.HasForeignKey(s => s.FrontDropPhotoId);
modelBuilder.Entity<DbMeasurement>()
.HasOptional(c => c.SideDropPhoto)
.WithMany()
.HasForeignKey(s => s.SideDropPhotoId);
I need all of this data to populate WPF DataGrid. The obvious solution is to add paging to this DataGrid. This solution is tempting but it will break the logic of my application badly. I want to create plots at runtime using this data, so I need all of it, not just some parts. I've tried to optimize it a bit by make every method to use async await, but it wasn't helpful enough. I've tried to add
.Configuration.AutoDetectChangesEnabled = false;
for each context, but loading time is still really long. How to approach this problem?
Upvotes: 2
Views: 942
Reputation: 16498
Why are you reinventing the wheel and manually loading and constructing your related entities? You’re causing an N+1 selects problem resulting in abhorrent performance. Let EF query for related entities efficiently via .Include
Example:
var results = context.Series
.AsNoTracking()
.Include( s => s.MeasurementSeries )
.ThenInclude( ms => ms.FrontDropPhoto )
.Where( ... )
.ToList(); // should use async
This will speed up execution dramatically though it may still not be quick enough for your requirments if it needs to construct hundreds of thousands to millions of objects, in which case you can retrieve the data in concurrent batches.
Upvotes: 0
Reputation: 34653
Firstly, async
/await
will not help you here. It isn't a "go faster" type of operation, it is about accommodating systems that "can be doing something else while this operation is computing". If anything, it makes an operation slower in exchange for making a system more responsive.
My recommendation would be to separate your concerns: On the one hand you want to display detailed data. On the other hand you want to plot an overall graph. Separate these. A user doesn't need to see details for every record at one time, paginating it server-side will greatly reduce the raw amount of data at any one time. Graphs want to see all data, but they don't care about "heavy" details like bitmaps.
The next thing would be to separate your view's model from your domain model (entity). Doing stuff like:
var measurementToAdd = new DbMeasurement
{
CurrentSeries = series,
MeasurementId = measurement.MeasurementId,
FrontDropPhotoId = measurement.FrontDropPhotoId,
FrontDropPhoto = measurement.FrontDropPhotoId.HasValue
? await GetDbDropPhotoById(measurement.FrontDropPhotoId.Value)
: null,
SideDropPhotoId = measurement.SideDropPhotoId,
SideDropPhoto = measurement.SideDropPhotoId.HasValue
? await GetDbDropPhotoById(measurement.SideDropPhotoId.Value)
: null,
};
... is just asking for trouble. Any code that accepts a DbMeasurement should receive a complete, or completable DbMeasurement, not a partially populated entity. It will burn you in the future. Define a view model for the data grid and populate it. This way you clearly differentiate what is an entity model and what is the view's model.
Next, for the data grid, absolutely implement server-side pagination:
public ICollection<MeasurementViewModel> GetMeasurements(int seriesId, int pageNumber, int pageSize)
{
using (var context = new DDropContext())
{
var measurementsForSeries = await context.Measurements
.Where(x => x.CurrentSeriesId == seriesId)
.Select(x => new MeasurementViewModel
{
MeasurementId = x.MeasurementId,
FromDropPhoto = x.FromDropPhoto.ImageData,
SideDropPhoto = x.SideDropPhoto.ImageData
})
.Skip(pageNumber*pageSize)
.Take(pageSize)
.ToList();
return measurementsForSeries;
}
}
This assumes that we want to pull image data for the rows if available. Leverage the navigation properties for related data in the query rather than iterating over results and going back to the database for each and every row.
For the graph plot you can return either the raw integer data or a data structure for just the fields needed rather than relying on the data returned for the grid. It can be pulled for the entire table without having the "heavy" image data. It may seem counter-productive to go to the database when the data might already be loaded once already, but the result is two highly efficient queries rather than one very inefficient query trying to serve two purposes.
Upvotes: 0
Reputation: 4048
Other than the very large amount of data that you are intent on returning, the main problem is that the way your code is structured means that for each of the 250,000 Series
you are performing another trip to the database to get the Measurements
for the Series
and a further 2 trips to get the front/side DropPhotos
for each Measurement
. Apart from the round-trip time for the 750,000 calls this completely avoids taking advantage of SQL's set-based performance optimisations.
Try to ensure that EF submits as few queries as possible to return your data, preferably one:
var loadedSeries = await context.Series
.Where(x => x.CurrentUserId == dbUserId)
.Select(x => new DbSeries
{
SeriesId = x.SeriesId,
MeasurementsSeries = x.MeasurementsSeries.Select(ms => new DbMeasurement
{
MeasurementId = ms.MeasurementId,
FrontDropPhotoId = ms.FrontDropPhotoId,
FrontDropPhoto = new DbDropPhoto
{
PhotoId = ms.FrontDropPhotoId
},
SideDropPhotoId = ms.SideDropPhotoId,
SideDropPhoto = new DbDropPhoto
{
PhotoId = ms.SideDropPhotoId
},
})
}).ToListAsync();
Upvotes: 2