Reputation: 3775
I know this question has been asked a lot by people and some even say
So, first(FirstOrDefault(predicate)) one is better in terms of performance1
and I get it, I also think that one more method call should be slightly slower, its just intuition that I have. Nevertheless, I decided to run some benchmarks in order to prove my right and boy little did I know.
This is what I had as results of running my benchmarks:
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-3630QM CPU 2.40GHz (Ivy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.101
[Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Job-XMZTSC : .NET Framework 4.8 (4.8.4121.0), X64 RyuJIT
Runtime=.NET 4.7.2
Method N Mean Error StdDev Ratio RatioSD
WherePlusFirstOrDefaultArray 10000 31.44 us 0.288 us 0.270 us 0.40 0.00
FirstOrDefaultArray 10000 78.47 us 0.679 us 0.635 us 1.00 0.00
WherePlusFirstOrDefaultList 10000 54.27 us 1.070 us 1.099 us 0.69 0.02
FirstOrDefaultList 10000 100.84 us 1.722 us 1.611 us 1.29 0.02
WherePlusFirstOrDefaultArray 100000 325.41 us 4.840 us 4.527 us 0.39 0.01
FirstOrDefaultArray 100000 829.85 us 16.513 us 15.446 us 1.00 0.00
WherePlusFirstOrDefaultList 100000 558.10 us 5.507 us 5.151 us 0.67 0.01
FirstOrDefaultList 100000 1,026.93 us 17.648 us 16.508 us 1.24 0.02
WherePlusFirstOrDefaultArray 1000000 3,255.46 us 9.615 us 7.507 us 0.40 0.01
FirstOrDefaultArray 1000000 8,134.15 us 108.425 us 101.420 us 1.00 0.00
WherePlusFirstOrDefaultList 1000000 5,477.63 us 70.584 us 66.024 us 0.67 0.01
FirstOrDefaultList 1000000 9,987.54 us 64.239 us 60.089 us 1.23 0.02
Not only Where(predicate).FirstOrDefault()
was faster, but at what margin.
This is my benchmark code using BenchmarkDotNet
[SimpleJob(RuntimeMoniker.Net472)]
public class Benchmarks
{
private int[] _array;
private List<int> _list;
[Params(10000, 100000, 1000000)]
public int N;
[GlobalSetup]
public void Setup()
{
_array = new int[N];
_list = new List<int>(N);
_array = Enumerable
.Repeat(0, N - 1).ToArray();
_list = Enumerable
.Repeat(0, N - 1).ToList();
_array[N - 2] = 7;
_list[N - 2] = 7;
}
[Benchmark]
public int WherePlusFirstOrDefaultArray()
{
var seven = _array.Where(n => n == 7).FirstOrDefault();
return seven;
}
[Benchmark(Baseline = true)]
public int FirstOrDefaultArray()
{
var seven = _array.FirstOrDefault(n => n == 7);
return seven;
}
[Benchmark]
public int WherePlusFirstOrDefaultList()
{
var seven = _list.Where(n => n == 7).FirstOrDefault();
return seven;
}
[Benchmark]
public int FirstOrDefaultList()
{
var seven = _list.FirstOrDefault(n => n == 7);
return seven;
}
}
Since I was stunned from the results leaving me with no other choice but to ask you guys what am I doing wrong or am I missing something?
EDIT:
I added benchmarks for array vs list structure to the guys thinking it might be because of the List
.
EDIT2: The saga continues and I think I am closer to the answer. Adding hardware counter to my benchmark yielded following interesting results:
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-3630QM CPU 2.40GHz (Ivy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.101
[Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Job-ZTIMEH : .NET Framework 4.8 (4.8.4121.0), X64 RyuJIT
Runtime=.NET 4.7.2
Method N Mean Error StdDev Ratio RatioSD CacheMisses/Op BranchMispredictions/Op
WherePlusFirstOrDefaultArray 1000000 3.222 ms 0.0224 ms 0.0210 ms 0.39 0.01 885 327
FirstOrDefaultArray 1000000 8.166 ms 0.1992 ms 0.1863 ms 1.00 0.00 1,795 810
WherePlusFirstOrDefaultList 1000000 5.564 ms 0.1066 ms 0.1228 ms 0.68 0.02 1,051 503
FirstOrDefaultList 1000000 10.161 ms 0.1816 ms 0.1699 ms 1.24 0.03 3,451 1,442
For some reason, I can't still explain to myself why FirstOrDefault(predicate)
method is yielding 2 to 3 times more branch mispredictions and ofc cache misses than Where(predicate).FirstOrDefault()
, surely this has to play a bit part in the results I am observing previously.
Also, one curious thing, if you look at FirstOrDefaultArray
and FirstOrDefaultList
results and compare them you will see that list is 24% slower, but assemblies generated by these methods are identical to me: https://www.diffchecker.com/WSjAQlet (I stripped memory addresses of the instructions.)
Upvotes: 6
Views: 1367
Reputation: 34708
This one really got me interested, so I did a bit more digging, and the answer looks to be related to that when FirstOrDefault
is called on a Where
results, the predicate executes under a WhereArrayIterator or a WhereEnumerableIterator depending on the source type.
When FirstOrDefault(predicate)
is used, it is iterating directly against the source array.
The difference?
FirstOrDefault(predicate)
public static TSource FirstOrDefault<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
if (source == null) throw Error.ArgumentNull("source");
if (predicate == null) throw Error.ArgumentNull("predicate");
foreach (TSource element in source) {
if (predicate(element)) return element;
}
return default(TSource);
}
vs. WhereArrayIterator's MoveNext
public override bool MoveNext() {
if (state == 1) {
while (index < source.Length) {
TSource item = source[index];
index++;
if (predicate(item)) {
current = item;
return true;
}
}
Dispose();
}
return false;
}
public static TSource FirstOrDefault<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
IList<TSource> list = source as IList<TSource>;
if (list != null) {
if (list.Count > 0) return list[0];
}
else {
using (IEnumerator<TSource> e = source.GetEnumerator()) {
if (e.MoveNext()) return e.Current;
}
}
return default(TSource);
}
The key difference is that the WhereIterator
classes both use while
loops to iterate through the enumerable/array where the FirstOrDefault(predicate)
is using foreach
. Over larger sets, foreach
is notably slower than while
or for
loops, so I suspect this will explain a good share of the variance.
Source above pulled from source reference: https://referencesource.microsoft.com/#system.core/System/Linq/Enumerable.cs,709a06a6b65427d6 and https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,8087366974af11d2
Regarding the original assumption that Where(predicate).FirstOrDefault()
might be assumed to be slower than FirstOrDefault(predicate)
due to the extra method call, this can be disproved by the following test:
[Test]
public void Test()
{
var data = new[] { 0, 0, 7, 0 };
var test1 = data.FirstOrDefault(isSeven);
var test2 = data.Where(isSeven).FirstOrDefault();
}
private bool isSeven(int val)
{
return val == 7;
}
with a breakpoint inside the isSeven
method. In both cases the predicate is only called 3 times where one might assume that it is "executed" in the Where(predicate)
call which would result in 4 calls, but it is only executed when FirstOrDefault()
wants to iterated through the iterator, triggering MoveNext()
on the WhereEnumerator.
Upvotes: 0
Reputation: 26917
The generic Enumerable.Where
function maps to different subclasses based on the type of argument. In this case, your argument is a List<int>
so you get returned from Where
a Enumerable.WhereListIterator<int>
that takes a List<int>
parameter. It then uses List<T>.GetEnumerator()
to enumerator the list, which returns a List<T>.Enumerator
struct
, which uses an index to index into the List<>
and return each member. This is very fast.
FirstOrDefault(Func<> pred)
doesn't have this optimization, and uses foreach
to traverse the list. While this also ultimately uses the same very fast List<T>.Enumerator
, it calls its member methods via the IEnumerable<T>
interface, which is considerably slower than calling the List<T>.Enumerator
methods directly.
My testing shows that the result is FirstOrDefault(Func<> pred)
takes about twice as long per element of the source list. If you write your own FirstOrDefault<T>(List<T> src, Func<T,bool> pred)
using either GetEnumerator
or foreach
, it will run about twice as fast as the built-in FirstOrDefault
.
Upvotes: 4