micahhoover
micahhoover

Reputation: 2160

where lambda vs. first lambda

Suppose I have some strings:

string[] strings = { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" };

What is the difference between:

string startsWithO = strings.First(s => s[0] == 'o');

And:

string startsWithO = strings.Where(s => s[0] == 'o').First();

Since Where() is deferred it shouldn't slow down the execution, right?

Upvotes: 8

Views: 1188

Answers (3)

Eamon Nerbonne
Eamon Nerbonne

Reputation: 48066

The performance penalty of using .Where(filter).First() rather than .First(filter) will usually be very small.

However, they're not the same - Where generates a new iterator that First can simply take one element of, whereas First(filter) can microoptimize by using just one loop and a direct return whenever filter matches.

So while both approaches have the same semantics and both execute the filter equally often (only as often as necessary), using First with a filter parameter doesn't need to create an intermediate iterator object and probably avoids some very simple method calls on that iterator too.

In other words, if you're executing such code millions of times, you will see a slight performance difference - but nothing huge; I would never worry about it. Whenever that tiny performance difference actually matters you're much better off just writing the (very simple) foreach-with-if statement that's equivalent and avoiding the extra calls and object allocations inherent in LINQ - but remember that this is a microoptimization you'll rarely need.

Edit: Benchmark demonstrating the effect:

This takes 0.78 seconds:

for(int i=0;i<10*1000*1000;i++)
  Enumerable.Range(0,1000).First(n=> n > 2);
GC.Collect();

But this takes 1.41 seconds:

for(int i=0;i<10*1000*1000;i++)
  Enumerable.Range(0,1000).Where(n=> n > 2).First();
GC.Collect();

Whereas plain loops are much faster (0.13 seconds):

long bla = 0;
for(int i=0;i<10*1000*1000;i++)
    for(int n=0;n<1000;n++)
        if(n > 2) { bla+=n; break; }
GC.Collect();
Console.WriteLine(bla);//avoid optimizer cheating.

Note that this benchmark only shows such extreme differences because I have a trivial filter and a very short non-matching prefix.

Based on some quick experimentation, the difference seems largely attributable to the details of which codepath gets taken. So, for array's and List<>s the first variant is actually faster, likely to do special-casing in .Where for those types that First doesn't have; for custom iterators, the second version is a tiny bit faster, as expected.

Summary:

.Where(...).First() is roughly as fast as .First(...) - don't bother choosing one or the other as an optimization. In general .First(...) is very slightly faster but in some common cases it is slower. If you really need that microoptimization then use plain loops which are faster than either.

Upvotes: 12

SWeko
SWeko

Reputation: 30902

In the specific case, calling First and Where on a string[], the methods called are the Enumerable.Where and Enumerable.Firstextension methods.

Enumerable.Where does this:

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) 
{
  // null checks omitted
  if (source is TSource[]) 
     return new WhereArrayIterator<TSource>((TSource[])source, predicate); 
  //the rest of the method will not execute
}

and the constructor of WhereArrayIterator does just:

public WhereArrayIterator(TSource[] source, Func<TSource, bool> predicate) {
  this.source = source; 
  this.predicate = predicate;
} 

So nothing is actually done here, except to create an iterator.

The first First method, without a predicate does this:

public static TSource First<TSource>(this IEnumerable<TSource> source) { 
  //null check
  IList<TSource> list = source as IList<TSource>;
  if (list != null) {
     //this branch is not taken as string[] does not implement IList<string>
     if (list.Count > 0) return list[0]; 
  }
  else { 
    //this is actually the WhereArrayIterator from before
    using (IEnumerator<TSource> e = source.GetEnumerator()) { 
      if (e.MoveNext()) 
        return e.Current;
    } 
  }
  throw Error.NoElements();
}

However, the second First does this

public static TSource First<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
   //null checks
   foreach (TSource element in source) {
     if (predicate(element)) return element; 
   }
   throw Error.NoMatch();
}

Which in the case of an array, is as fast as direct linear access.
In a nutshell, this means that calling First(predicate) on an array will be somewhat faster, by a not large, but still detectable factor. This might not hold for lists, and will certainly not hold for IQueryable objects, that are a completely different story.

However, this is micro-optimization at it's worst. Unless this is done millions of times, it won't save too many seconds. Even as I know this now, I'll still use whatever is clearer to read and understand.

Upvotes: 1

Sten Petrov
Sten Petrov

Reputation: 11040

There are no differences here.

Calling Where first returns an iterator that's not used until First starts looping over it.

If the predicate doesn't match any elements then the same exception InvalidOperationException is thrown.

The only difference is the verbosity of the code, so .First without .Where should be preferred

Upvotes: 1

Related Questions