Reputation: 7436
I've encountered with one performance problem that I can't quite understand. I know how to fix it but I don't understand Why that happens. It's just for fun!
Let's talk code. I simplified the code as much as I could to reproduce the issue.
Suppose we have a generic class. It has an empty list inside and does something with T
in constructor. It has Run
method that calls an IEnumerable<T>
method on the list, e.g. Any()
.
public class BaseClass<T>
{
private List<T> _list = new List<T>();
public BaseClass()
{
Enumerable.Empty<T>();
// or Enumerable.Repeat(new T(), 10);
// or even new T();
// or foreach (var item in _list) {}
}
public void Run()
{
for (var i = 0; i < 8000000; i++)
{
if (_list.Any())
// or if (_list.Count() > 0)
// or if (_list.FirstOrDefault() != null)
// or if (_list.SingleOrDefault() != null)
// or other IEnumerable<T> method
{
return;
}
}
}
}
Then we have a derived class which is empty:
public class DerivedClass : BaseClass<object>
{
}
Let's measure the performance of running ClassBase<T>.Run
method from both classes. Accessing from derived type is 4 times slower that from base class. And I can't understand why that happens. Compiled in Release mode, result is the same with warm up. It happens on .NET 4.5 only.
public class Program
{
public static void Main()
{
Measure(new DerivedClass());
Measure(new BaseClass<object>());
}
private static void Measure(BaseClass<object> baseClass)
{
var sw = Stopwatch.StartNew();
baseClass.Run();
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
}
Full listing on gist
Upvotes: 64
Views: 2106
Reputation: 1326
Update:
There's an answer from the CLR team on Microsoft Connect
It is related to dictionary lookups in shared generics code. The heuristic in runtime and JIT do not work well for this particular test. We will take a look what can be done about it.
In the meantime, you can workaround it by adding two dummy methods to the BaseClass (do not even need to be called). It will cause the heuristic to work as one would expect.
Original:
That's JIT fail.
Can be fixed by this crazy thing:
public class BaseClass<T>
{
private List<T> _list = new List<T>();
public BaseClass()
{
Enumerable.Empty<T>();
// or Enumerable.Repeat(new T(), 10);
// or even new T();
// or foreach (var item in _list) {}
}
public void Run()
{
for (var i = 0; i < 8000000; i++)
{
if (_list.Any())
{
return;
}
}
}
public void Run2()
{
for (var i = 0; i < 8000000; i++)
{
if (_list.Any())
{
return;
}
}
}
public void Run3()
{
for (var i = 0; i < 8000000; i++)
{
if (_list.Any())
{
return;
}
}
}
}
Note that Run2()/Run3() are not called from anywhere. But if you comment out Run2 or Run3 methods - you'll get performance penalty as before.
There's something related to stack alignment or to the size of method table, I guess.
P.S. You can replace
Enumerable.Empty<T>();
// with
var x = new Func<IEnumerable<T>>(Enumerable.Empty<T>);
still the same bug.
Upvotes: 31
Reputation: 711
After some experimentation, I found that Enumerable.Empty<T>
is always slow when T is a class type; if it is a value type it is faster, but dependent on the struct size.
I tested object, string, int, PointF, RectangleF, DateTime, Guid.
Looking at how it is implemented, I tried different alternatives, and found some that works fast.
Enumerable.Empty<T>
relies on the internal class EmptyEnumerable<TElement>
's Instance
static property.
Then, what Enumerable.Empty<T>
is really doing is only return an empty array of T.
Trying different approaches, I found that the slowness is caused by both the property and the volatile modifier.
Adopting a static field initialized to T[0] instead of Enumerable.Empty<T>
like
public static readonly T[] EmptyArray = new T[0];
the problem disappeared. Note that the readonly modifier is not determinant. Having the same static field declared with volatile or accessed through a property causes the problem.
Regards, Daniele.
Upvotes: 0
Reputation: 1
It seems there is an CLR optimizator issue. Turn off the "Optimize Code" on the Build tab and try to run your test again.
Upvotes: -2