UNeverNo
UNeverNo

Reputation: 583

Why is Linq that slow (see provided examples)

This Linq is very slow:

IEnumerable<string> iedrDataRecordIDs = dt1.AsEnumerable()
    .Where(x => x.Field<string>(InputDataSet.Column_Arguments_Name) == sArgumentName 
         && x.Field<string>(InputDataSet.Column_Arguments_Value) == sArgumentValue)
    .Select(x => x.Field<string>(InputDataSet.Column_Arguments_RecordID));

IEnumerable<string> iedrDataRecordIDs_Filtered = dt2.AsEnumerable()
    .Where(x => iedrDataRecordIDs.Contains(
                 x.Field<string>(InputDataSet.Column_DataRecordFields_RecordID)) 
             && x.Field<string>(InputDataSet.Column_DataRecordFields_Field) 
                 == sDataRecordFieldField 
             && x.Field<string>(InputDataSet.Column_DataRecordFields_Value) 
                 == sDataRecordFieldValue)
    .Select(x => x.Field<string>(InputDataSet.Column_DataRecordFields_RecordID));

IEnumerable<string> ieValue = dt2.AsEnumerable()
    .Where(x => x.Field<string>(InputDataSet.Column_DataRecordFields_RecordID) 
                == iedrDataRecordIDs_Filtered.FirstOrDefault() 
            && x.Field<string>(InputDataSet.Column_DataRecordFields_Field) == sFieldName)
    .Select(x => x.Field<string>(InputDataSet.Column_DataRecordFields_Value));

if (!ieValue.Any()) //very slow at this point
    return iedrDataRecordIDs_Filtered.FirstOrDefault();

This change accelerates it by a factor of 10 or more

string sRecordID = dt2.AsEnumerable()
    .Where(x => iedrDataRecordIDs.Contains(
            x.Field<string>(InputDataSet.Column_DataRecordFields_RecordID)) 
        && x.Field<string>(InputDataSet.Column_DataRecordFields_Field) 
            == sDataRecordFieldField 
        && x.Field<string>(InputDataSet.Column_DataRecordFields_Value) 
            == sDataRecordFieldValue)
    .Select(x => x.Field<string>(InputDataSet.Column_DataRecordFields_RecordID))
    .FirstOrDefault();

IEnumerable<string> ieValue = dt2.AsEnumerable()
   .Where(x => x.Field<string>(InputDataSet.Column_DataRecordFields_RecordID) == sRecordID 
        && x.Field<string>(InputDataSet.Column_DataRecordFields_Field) == sFieldName)
    .Select(x => x.Field<string>(InputDataSet.Column_DataRecordFields_Value));

if (!ieValue.Any()) //very fast at this point
    return iedrDataRecordIDs_Filtered.FirstOrDefault(); 

The only change is that I store the result directly in a new variable and use create the where clause with this value instead of a LINQ query (which should be calculated when needed). But LINQ seems to calculate it in a bad way here or am I doing something wrong?

Here some values of my data

dt1.Rows.Count                     142 
dt1.Columns.Count                    3 
dt2.Rows.Count                     159 
dt2.Columns.Count                    3 
iedrDataRecordIDs.Count()            1 
iedrDataRecordIDs_Filtered.Count()   1 
ieValue.Count()                      1

Upvotes: 1

Views: 356

Answers (1)

Kirk Broadhurst
Kirk Broadhurst

Reputation: 28708

You're asking why

IEnumerable<string> iedrDataRecordIDs_Filtered = data;    
foreach (var item in collection)
{
    // do something with
    iedrDataRecordIDs_Filtered.FirstOrDefault();
}

is slower than

string sRecordID = data.FirstOrDefault();
foreach (var item in collection)
{
    // do something with
    sRecordID;
}

Very simply because you're evaluating the iedrDataRecordIDs collection every time you get the FirstOrDefault. This isn't a concrete object, it's an enumerable set. That's really just a function that returns some objects. Every time you query it the function will be called and you'll pay that execution cost.

If you change

IEnumerable<string> iedrDataRecordIDs_Filtered = dt2.AsEnumerable()... 
var recordIDs = iedrDataRecordIDs_Filtered.ToList();

and then use recordIDs.FirstOrDefault() you'll see a huge performance increase.

Upvotes: 3

Related Questions