infroz
infroz

Reputation: 1162

Linq to SQL/Entities: Greatest N-Per group problem/performance increase

Allright, So I have too encountered what I believe is the Greatest-N-Per problem, whereas this question has been answered before I do not think it has been solved well yet with Linq. I have a table with a few million entries, so therefore queries take a lot of time. I would like these queries to take less than a second, whereas currently they spend about 10 seconds to infinity.

var query = 
        from MD in _context.MeasureDevice
        where MD.DbdistributorMap.DbcustomerId == 6 // Filter the devices based on customer
        select new
        {
            DbMeasureDeviceId = MD.DbMeasureDeviceId,
            // includes measurements and alarms which have 1-m and 1-m relations
            Measure = _context.Measures.Include(e=> e.MeasureAlarms)
                .FirstOrDefault(e => e.DbMeasureDeviceId == MD.DbMeasureDeviceId && e.MeasureTimeStamp == _context.Measures
                    .Where(x => x.DbMeasureDeviceId == MD.DbMeasureDeviceId)
                    .Max(e=> e.MeasureTimeStamp)),
            Address = MD.Dbaddress // includes address 1-1 relation
        };

In this query I'm selecting data from 4 different tables. Firstly the MeasureDevice table which is the primary entity im after. Secondly I want the latest measurement from the measures table, which should also include alarms from another table if any exist. Lastly I need the address of the device, which is located in its own table.

There are a few thousand devices, but they have between themselves several thousands of measures which amount to several million rows in the measurement table.

I wonder if anyone has any knowledge as to either improve the performance of Linq queries using EF5, or any better method for solving the Greatest-N-Per problem. I've analyzed the query using Microsoft SQL Server Manager and the most time is spent fetching the measurements.

Query generated as requested:

SELECT [w].[DBMeasureDeviceID], [t].[DBMeasureID], [t].[AlarmDBAlarmID], [t].[batteryValue], [t].[DBMeasureDeviceID], [t].[MeasureTimeStamp], [t].[Stand], [t].[Temperature], [t].[c], [a].[DBAddressID], [a].[AmountAvtalenummere],
[a].[DBOwnerID], [a].[Gate], [a].[HouseCharacter], [a].[HouseNumber], [a].[Latitude], [a].[Longitude], [d].[DBDistributorMapID], [m1].[DBMeasureID], [m1].[DBAlarmID], [m1].[AlarmDBAlarmID], [m1].[MeasureDBMeasureID]
      FROM [MeasureDevice] AS [w]
      INNER JOIN [DistribrutorMap] AS [d] ON [w].[DBDistributorMapID] = [d].[DBDistributorMapID]
      LEFT JOIN [Address] AS [a] ON [w].[DBAddressID] = [a].[DBAddressID]
      OUTER APPLY (
          SELECT TOP(1) [m].[DBMeasureID], [m].[AlarmDBAlarmID], [m].[batteryValue], [m].[DBMeasureDeviceID], [m].[MeasureTimeStamp], [m].[Stand], [m].[Temperature], 1 AS [c]
          FROM [Measure] AS [m]
          WHERE ([m].[MeasureTimeStamp] = (
              SELECT MAX([m0].[MeasureTimeStamp])
              FROM [Measure] AS [m0]
              WHERE [m0].[DBMeasureDeviceID] = [w].[DBMeasureDeviceID])) AND ([w].[DBMeasureDeviceID] = [m].[DBMeasureDeviceID])
      ) AS [t]
      LEFT JOIN [MeasureAlarm] AS [m1] ON [t].[DBMeasureID] = [m1].[MeasureDBMeasureID]
      WHERE [d].[DBCustomerID] = 6
      ORDER BY [w].[DBMeasureDeviceID], [d].[DBDistributorMapID], [a].[DBAddressID], [t].[DBMeasureID], [m1].[DBMeasureID], [m1].[DBAlarmID]

Entity Relations

enter image description here

Upvotes: 0

Views: 110

Answers (1)

Steve Py
Steve Py

Reputation: 34908

You have navigation properties defined, so it stands that MeasureDevice should have a reference to it's Measures:

var query = _context.MeasureDevice
    .Include(md => md.Measures.Select(m => m.MeasureAlarms)
    .Where(md => md.DbDistributorMap.DbCustomerId == 6)
    .Select(md => new 
    {
        DbMeasureDeviceId = md.DbMeasureDeviceId,
        Measure = md.Measures.OrderByDescending(m => m.MeasureTimeStamp).FirstOrDefault(),
        Address = md.Address
    });

The possible bugbear here is including the MeasureAlarms with the required Measure. AFAIK you cannot put an .Include() within a .Select() (Where we might have tried Measure = md.Measures.Include(m => m.MeasureAlarms)...

Caveat: It has been quite a while since I have used EF 5 (Unless you are referring to EF Core 5) If you are using the (very old) EF5 in your project I would recommend arguing for the upgrade to EF6 given EF6 did bring a number of performance and capability improvements to EF5. If you are instead using EF Core 5, the Include statement above would be slightly different:

.Include(md => md.Measures).ThenInclude(m => m.MeasureAlarms)

Rather than returning entities, my go-to advice is to use Projection to select precisely the data we need. That way we don't need to worry about eager or lazy loading. If there are details about the Measure and MeasureAlarms we need:

var query = _context.MeasureDevice
    .Where(md => md.DbDistributorMap.DbCustomerId == 6)
    .Select(md => new 
    {
        md.DbMeasureDeviceId,
        Measure = md.Measures
            .Select(m => new 
            {
                m.MeasureId,
                m.MeasureTimestamp,

                // any additional needed fields from Measure

                Address = m.Address.Select(a => new 
                {
                   // Assuming Address is an entity, any needed fields from Address.
                }),
                Alarms = m.MeasureAlarms.Select(ma => new 
                {
                    ma.MeasureAlarmId,
                    ma.Label // etc. Whatever fields needed from Alarm...
                }).ToList()
            }).OrderByDescending(m => m.MeasureTimestamp)
            .FirstOrDefault()
    });

This example selects anonymous types, alternatively you can define DTOs/ViewModels and can leverage libraries like Automapper to map the fields to the respective entity values to replace all of that with something like ProjectTo<LatestMeasureSummaryDTO> where Automapper has rules to map a MeasureDevice to resolve the latest Measure and extract the needed fields.

The benefits of projection are handling otherwise complex/clumsy eager loading, building optimized payloads with only the fields a consumer needs, and resilience in a changing system where new relationships don't accidentally introduce lazy loading performance issues. For example if Measure currently only has MeasureAlarm to eager load, everything works. But down the road if a new relationship is added to Measure or MeasureAlarm and your payload containing those entities are serialized, that serialization call will now "trip" lazy loading on the new relationship unless you revisit all queries retrieving these entities and add more eager loads, or start worrying about disabling lazy loading entirely. Projections remain the same until only if and when the fields they need to return actually need to change.

Beyond that, the next thing you can investigate is to run the resulting query through an analyzer, such as within SQL Management Studio to return the execution plan and identify whether the query could benefit from indexing changes.

Upvotes: 1

Related Questions