StarLord
StarLord

Reputation: 1003

How to optimize Linq query with large number of records?

Please help me to optimize the below code. I have tried different methods but I am not getting a significant performance improvement. There are around 30k entries in database and it's taking around 1 min to load in local.

var alarms = from healthIssue in _context.HealthIssues.AsNoTracking()
                         join asset in _context.Assets.AsNoTracking() on healthIssue.AssetNumber equals asset.SerialNumber into joinedTable
                         from data in joinedTable.DefaultIfEmpty()
                         select new
                         {
                             ID = healthIssue.ID,
                             AssetNumber = healthIssue.AssetNumber,                                                         
                             AlarmName = healthIssue.AlarmName, 
                             Crew = data.Crew,                           
                         };
//alarmsViewModelList count is 30k  
var alarmsViewModelList = await alarms.ToListAsync();
//groupedData count = 12k 
var groupedData = alarmsViewModelList.Select(c => new { c.AssetNumber,c.AlarmName}).Distinct().ToList();
// filteralarms' count = 20k 
var filteralarms = (alarmsViewModelList.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown).ToList());
for (int j = 0; j < groupedData.Count; j++)
{
    var alarm = groupedData[j];
    //The line is actually slowing the code.
    var alarmlist = filteralarms.AsEnumerable().Where(c => c.AlarmName == alarm.AlarmName && c.AssetNumber == alarm.AssetNumber
                            ).Select
                            (c => new
                            {
                                HealthIssueID = c.ID,
                                AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
                                AlarmSeverityLevel = c.AlarmSeverityLevel,
                                
                            }).OrderByDescending(c =>c.AlarmLastUpdateDateTime).ToList();
    int alarmCount = alarmlist.Count;
    if (alarmCount > 1)
    {
        businessLogicFunction(alarmlist); 
    }

}

Upvotes: 4

Views: 1834

Answers (11)

Ben L
Ben L

Reputation: 744

If you populate a dictionary with the keys being the predicate values concatenated and the value being the selected alarm properties you should be able to do much better.

var alarms = from healthIssue in _context.HealthIssues.AsNoTracking()
                     join asset in _context.Assets.AsNoTracking() on healthIssue.AssetNumber equals asset.SerialNumber into joinedTable
                     from data in joinedTable.DefaultIfEmpty()
                     select new
                     {
                         ID = healthIssue.ID,
                         AssetNumber = healthIssue.AssetNumber,                                                         
                         AlarmName = healthIssue.AlarmName, 
                         Crew = data.Crew,                           
                     };

//alarmsViewModelList count is 30k  
var alarmsViewModelList = await alarms.ToListAsync();
//groupedData count = 12k 
var groupedData = alarmsViewModelList
  .Select(c => new { c.AssetNumber,c.AlarmName})
  .Distinct()
  .ToList();
// filteralarms' count = 20k 
var filteralarms = alarmsViewModelList
 .Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown).ToList();

//populate an in memory dictionary with a key that is the where clause predicate
var alarmDict = new Dictioanry<string, Alarm>();
foreach (var c in filterAlarms) {
  var key = c.AlarmName+"|"+c.AssetNumber;
    if (!alarmDict.TryGetValue(key, out var list)) {
      alarmDict[key] = new List<Alarm>();
    }
    var alarm = new { 
      HealthIssueID = c.ID,
      AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime), 
      AlarmSeverityLevel = c.AlarmSeverityLevel};
      
  alarmDict[key].Add(alarm);
}

for (int j = 0; j < groupedData.Count; j++)
{
  var alarm = groupedData[j];
  //use dictionary for faster results. building the dictionary is now the more expensive operation
  var key = alarm.AlarmName+"|"+alarm.AssetNumber;
  if (alarmDict.TryGetValue(key, out var alarms)) {
    var alarmlist = alarms
      .OrderByDescending(c => c.AlarmLastUpdateDateTime)
      .ToList();
      
    int alarmCount = alarmlist.Count;
    if (alarmCount > 1)
    {
       businessLogicFunction(alarmlist); 
    }
  }
}

Upvotes: 0

dbc
dbc

Reputation: 116785

You are creating two lists derived from alarmsViewModelList:

  1. groupedData which are the distinct values of { alarm.AssetNumber, alarm.AlarmName }
  2. filteralarms which are all alarms with AlarmSeverityLevel != AlarmSeverityLevel.Unknown.

After creating these two lists, you loop through the first and attempt to cross-reference it with the values in the second via a linear search. This is an n-squared operation. But since the two lists were originally created from the same source data alarmsViewModelList, you can maintain a list of original objects for each grouping key by using Enumerable.GroupBy() instead of Distinct(). Doing so should completely eliminate the need for the n-squared cross-reference.

In addition, since you only want to pass alarms with a known severity level to your business logic function, you can filter for them upfront, before doing the grouping. This should improve performance linearly, depending on the number of alarms that are skipped.

Thus your code should look something like:

var groupedData = alarmsViewModelList
    .Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown)
    .GroupBy(c => new { c.AssetNumber, c.AlarmName })
    .Select(g => g.Select(c =>
                          new {
                              HealthIssueID = c.ID,
                              AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
                              AlarmSeverityLevel = c.AlarmSeverityLevel,
                          }).OrderByDescending(c =>c.AlarmLastUpdateDateTime).ToList())
    .Where(l => l.Count > 0);

foreach (var alarmList in groupedData)
    businessLogicFunction(alarmList);

Notes:

  • When you do ToListAsync() you are fetching all the data locally, You might want to try doing the filtering or grouping on the server side rather than the client side.

Demo fiddles using synthetic data generated using Enumerable.Range() here:

  1. The original code with elapsed time = 4.2335498 seconds here.

    I had to tweak the code somewhat just to make it compile.

  2. The modified code with with elapsed time = 0.0393871 seconds here.

Upvotes: 1

Roger Hill
Roger Hill

Reputation: 4099

I am going say that your question is the wrong question. It is good to write efficient business logic for processing your data, and often an efficient algorithm can improve performance by orders of magnitude.(Sorting lists is a great example here)

However, if you want to make this go faster, you might look at optimizing your data as you pull it from the db. Modern databases are highly tuned to filter and join data very fast, by people who are thinking about nothing but this. Provided that you have some decent indexing on your tables / blobs /graph /whatever, you could include some clauses in your db query to filter out records that don't need to be processed.

Pulling 30k records and sending it over the wire is going to take (in db time scales) a lot of effort. I would hope that you can grab all this in a single query, as retrieving it in multiple pulls will take even longer.

I have no data on how log your query is running for, or what the load time is to transfer and de-serialize your data. I will be willing to bet you money though, that if you filter dead records in your query and say cut your payload in half that you will gain a huge performance boost. Don't load records that you don't need to process. Also, if you can properly filter on the metal, your linq will probably just turn into a list that can be processed in o(n) time. Why tune your Linq, if you can tune the data it runs on?

Good luck.

Upvotes: 0

gsavo
gsavo

Reputation: 1

I'm thinking the issue is that you're looping over a Where clause and ToList 12k times - that's slowing you down. What if you change out this block:

for (int j = 0; j < groupedData.Count; j++)
{
    var alarm = groupedData[j];
    //The line is actually slowing the code.
    var alarmlist = filteralarms.AsEnumerable().Where(c => c.AlarmName == alarm.AlarmName && c.AssetNumber == alarm.AssetNumber
                            ).Select
                            (c => new
                            {
                                HealthIssueID = c.ID,
                                AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
                                AlarmSeverityLevel = c.AlarmSeverityLevel,
                                
                            }).OrderByDescending(c =>c.AlarmLastUpdateDateTime).ToList();
    int alarmCount = alarmlist.Count;
    if (alarmCount > 1)
    {
        businessLogicFunction(alarmlist); 
    }

}

to a join? This will significantly improve your performance. Since you're ok with using query syntax, try using a more natural-looking join like this:

var alarmlist = from g in groupedData
                join f in filteralarms on g.AssetNumber equals f.AssetNumber
                where g.AlarmName == f.AlarmName
                select new
                {
                    HealthIssueID = f.ID,
                    AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(f.AlarmLastUpdatedTime),
                    AlarmSeverityLevel = f.AlarmSeverityLevel,
                };

int alarmCount = alarmlist.Count();
if (alarmCount > 1)
{
    businessLogicFunction(alarmlist.OrderByDescending(o => o.AlarmLastUpdateDateTime).ToList());
}

Hope that helps!

Upvotes: 0

sa-es-ir
sa-es-ir

Reputation: 5042

I don't see any answers for optimising of getting alarmsViewModelList query because you already fetch 30K records from database and then in the application memory trying to filter alarms but why not filter them on the database side.

It is worth to mention the AsNoTracking() is useless here because you use anonymous type for result and EF will track only entities.

I think OP didn't clear these ones:

  • AlarmSeverityLevel and AlarmLastUpdateDateTime come from which tables? (but I assume it's from first table)
  • you select the Crew but not used later, why? we can remove it so?

So the statement could be like this:

var alarms = Context.HealthIssues
            .Where(x => x.AlarmSeverityLevel != AlarmSeverityLevel.Unknown)
            .Select(x => new { HealthIssueID= x.Id, x.AssetNumber , x.AlarmName ,x.AlarmSeverityLevel ,x.AlarmLastUpdateDateTime })
            .GroupBy(x => new { x.AssetNumber, x.AlarmName })
            .Select(x => new
            {
                ItemsCount = x.Count(),
                Items = x.Select(s => new
                {
                    AlarmSeverityLevel = s.AlarmSeverityLevel,
                    AlarmLastUpdatedTime = s.AlarmLastUpdateDateTime,//for not generating translate error, can change later,
                    HealthIssueID = s.HealthIssueID
                }).OrderByDescending(o => o.AlarmLastUpdatedTime)
            }).ToList();

        for (int i = 0; i < alarms.Count; i++)
        {
            var alarm = alarms[i];
            //you can update AlarmLastUpdatedTime here for alarm.Items

            if(alarm.ItemsCount > 1)
                businessLogicFunction(alarm. Items);

        }

I can't run this query please let me know if there is any issues.

Upvotes: 0

Areu
Areu

Reputation: 155

Avoid using ToList() and AsEnumerable() unnecessarily, as these operations can be expensive.try to use the IQueryable interface to filter and sort the data, so that the database can handle these operations for you eg:

var alarms = from healthIssue in _context.HealthIssues.AsNoTracking()
             join asset in _context.Assets.AsNoTracking() on healthIssue.AssetNumber equals asset.SerialNumber into joinedTable
             from data in joinedTable.DefaultIfEmpty()
             select new
             {
                 ID = healthIssue.ID,
                 AssetNumber = healthIssue.AssetNumber,
                 AlarmName = healthIssue.AlarmName,
                 Crew = data.Crew,
             };

Instead of using a for loop to iterate over groupedData, you can use foreach is generally more efficient because it avoids the overhead

if possible, try to move the businessLogicFunction call outside of the loop, so that it is only called once for each distinct alarm.AlarmName and alarm.AssetNumber combination.


// filteralarms count = 20k 
var filteralarms = alarms.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown);

// groupedData count = 12k 
var groupedData = filteralarms.Select(c => new { c.AssetNumber, c.AlarmName }).Distinct();

foreach (var alarm in groupedData)
{
    // alarmlist count = ?
    var alarmlist = filteralarms.Where(c => c.AlarmName == alarm.AlarmName && c.AssetNumber == alarm.AssetNumber
                                ).Select
                                (c => new
                                {
                                    HealthIssueID = c.ID,
                                    AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
                                    AlarmSeverityLevel = c.AlarmSeverityLevel,

                                }).OrderByDescending(c => c.AlarmLastUpdateDateTime);

    if (alarmlist.Any()
    {
        businessLogicFunction(alarmlist);
    }
}

Any over Count

Upvotes: 0

Evk
Evk

Reputation: 101483

You basically group data by AlarmName + AssetNumber, filter our alarms with severity level Unknown then run business function on grouped batches (after minor adjustment). More efficient approach to this would be something like this:

var grouped = alarmsViewModelList
    // throw away unknown, you are not using them anywhere
    .Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown)
    // group by AssetNumber + AlarmName
    .GroupBy(c => new { c.AssetNumber, c.AlarmName })
    .Select(gr => new
    {
        gr.Key.AlarmName,
        gr.Key.AssetNumber,
        // convert batch of this group to the desired form
        Items = gr.Select(c => new
        {
            HealthIssueID = c.ID,
            AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
            AlarmSeverityLevel = c.AlarmSeverityLevel,
        }).OrderByDescending(c => c.AlarmLastUpdateDateTime).ToList()
    });

foreach (var data in grouped) {
    if (data.Items.Count > 1) {
        businessLogicFunction(data.Items);
    }
}

Upvotes: 4

Vivek Nuna
Vivek Nuna

Reputation: 1

I would go with these three optimizations in your code.

var groupedData = alarmsViewModelList.GroupBy(c => new { c.AssetNumber,c.AlarmName }).ToListAsync();

var filteralarms = await alarmsViewModelList.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown).ToListAsync();


foreach (var alarm in groupedData)
    var alarmlist = filteralarms.AsEnumerable().Where(c => c.AlarmName == alarm.Key.AlarmName && c.AssetNumber == alarm.Key.AssetNumber)
                    .Select(c => new
                    {
                        HealthIssueID = c.ID,
                        AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
                        AlarmSeverityLevel = c.AlarmSeverityLevel,
                        
                    }).OrderByDescending(c => c.AlarmLastUpdateDateTime).ToList();

Upvotes: 0

lork6
lork6

Reputation: 154

This is what I can make with linq.

  //alarmsViewModelList count is 30k  
var alarmsViewModelList = await alarms.ToListAsync();
//groupedData is almost 12k 
var groupedData = alarmsViewModelList.Select(c => new { c.AssetNumber,c.AlarmName}).Distinct().ToList();
// filteralarms' count is almost 20k 
var filteralarms = alarmsViewModelList.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown).OrderByDescending(c => DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdateDateTime));
for (int j = 0; j < groupedData.Count; j++)
{
    var alarm = groupedData[j];
    //The line is actually slowing the code.
    var alarmlist = filteralarms.Where(c => c.AlarmName == alarm.AlarmName && c.AssetNumber == alarm.AssetNumber);
    
    if (alarmlist.Count() > 1)
    {
        businessLogicFunction(alarmlist.Select
                            (c => new
                            {
                                HealthIssueID = c.ID,
                                AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
                                AlarmSeverityLevel = c.AlarmSeverityLevel,
                                
                            }).ToList()); 
    }
filteralarms = filteralarms.Where(c => c.AlarmName != alarm.AlarmName || c.AssetNumber != alarm.AssetNumber).ToList();

}

Above code O(2n) I think. And if you can, you can make it faster by removing ToList() in businessLogicFunction like.

businessLogicFunction(alarmlist.Select
                        (c => new
                        {
                            HealthIssueID = c.ID,
                            AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
                            AlarmSeverityLevel = c.AlarmSeverityLevel,

                        })); 

Changed it so don't use skip insted index that way way faster Even faster approach is order the lists and skip the rest like this:

//alarmsViewModelList count is 30k  
        var alarmsViewModelList = alarms.ToList();
        // here the groupedData list look like this {(1,1),(2,1),(3,1),(4,1),(5,1),(6,1)}. because the list is orderd by assetNumber then by alarmName
        var groupedData = alarmsViewModelList.Select(c => new { c.AssetNumber, c.AlarmName }).Distinct().OrderBy(c => c.AssetNumber ).ThenBy(c => c.AlarmName).ToList();
        // here the filteralarms list look like this {(1,1), (1,1) (1,1), (2,1),(2,1),(3,1),(3,1),(3,1),(4,1)...}
        var filteralarms = alarmsViewModelList.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown).OrderBy(c => c.AssetNumber).ThenBy(c => c.AlarmName).AsEnumerable();
        int k  = 0;
        for (int j = 0; j < groupedData.Count; j++)
        {
            
            var alarm = groupedData[j];
            //The line is actually slowing the code.
            var alarmlist = new List<Alarm>();
            for(; k<filteralarms.Count();k++)
            {
                if (filteralarms[k].AlarmName == alarm.AlarmName && filteralarms[k].AssetNumber == alarm.AssetNumber)
                {
                    alarmlist.Add(filteralarms[k]);
                }
                else
                {
                    break;
                }
            }
            if (alarmlist.Count() > 1)
            {
                businessLogicFunction(alarmlist.Select
                                    (c => new
                                    {
                                        HealthIssueID = c.ID,
                                        AlarmLastUpdateDateTime = c.AlarmLastUpdatedTime,
                                        AlarmSeverityLevel = c.AlarmSeverityLevel,

                                    }).OrderByDescending(c => c.AlarmLastUpdateDateTime).ToList());
            }
            

Above code is O(n) I think.

Upvotes: 3

Connor Thornley
Connor Thornley

Reputation: 26

A simple logical performance gain would be to remove resolved alarms as you go:

...
var alarm = groupedData[j];
//The line is actually slowing the code.
var matchingAlarms = filteralarms.Where(c => c.AlarmName == alarm.AlarmName && c.AssetNumber == alarm.AssetNumber);
var alarmlist = filteralarms.Except(matchingAlarms
                        ).Select
                         (c => new
                         {
...

Upvotes: 0

Uğur Demirel
Uğur Demirel

Reputation: 81

Try it with AsNoTracking, if there is an OrderBy,OrderByDescending record, use it at the end

var iq_filteralarms = alarmsViewModelList.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown).AsNoTracking(); /* IQueryable */
            foreach (var item in alarmsViewModelList.Select(c => new
            {
                c.AssetNumber,
                c.AlarmName
            }).Distinct())
            {
                var iq_alarmlist = iq_filteralarms.Where(c => c.AlarmName == item.AlarmName && c.AssetNumber == item.AssetNumber).Select(c=> new { 
c.ID, 
c.AlarmLastUpdatedTime,
c.AlarmSeverityLevel
});
                if (iq_alarmlist.Count() > 1)
                {
                    businessLogicFunction(iq_alarmlist.AsEnumerable().Select(c => new
                    {
                        HealthIssueID = c.ID,
                        AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
                        AlarmSeverityLevel = c.AlarmSeverityLevel,

                    }).OrderByDescending(c => c.AlarmLastUpdateDateTime));
                }
            }

Upvotes: 0

Related Questions