Levon
Levon

Reputation: 584

Why does this linq with reflection statement beat my compiled expression tree?

Having been inspired by this blogpost I set out to refactor the following Linq query using compiled expression trees:

var result = dummies.Select(y =>
                   y.GetType().GetProperties()
                   .Where(x => x.GetMethod.IsPublic)
                   .Where(x => fields.Contains(x.Name, StringComparer.OrdinalIgnoreCase))
                   .ToDictionary(x => x.Name, x => x.GetValue(y)))
                   .Where(x => x.Any());

The code is intended to extract a set of values of specified properties, returning a dictionary for each element. In my first encounter with writing my own expression trees I've come up with this solution to generate the property calls:

        foreach (string propName in Properties)
        {
            var prop = typeof(DummyType).GetProperty(propName);

            if (prop != null)
            {
                props.Add(prop);
            }
        }

        var accessors = new List<Tuple<string, Func<DummyType, object>>>();

        foreach (var prop in props)
        {
            var instance = Expression.Parameter(typeof(DummyType));
            var call = Expression.Property(instance, prop);
            var expr = Expression.Lambda<Func<DummyType, object>>(call, instance).Compile();
            accessors.Add(Tuple.Create(prop.Name, expr));
        }

For each DummyType element the calls in accessors will be iterated This implementation is unable to deal with properties returning value types, though I was able to solve that issue using MakeGenericType combined with a DynamicInvoke call but because it's documented as "late-bound" I've discarded it to avoid it distorting the performance.

The results are surprising with the Linq query beating my expression trees hands down even though it boxes value types for me and GetProperties is called for each element, whereas the linq expression property accessors are generated before collecting the values from the collection of dummytypes.

|   Method |         Mean |      Error |     StdDev |  Ratio | RatioSD |
|--------- |-------------:|-----------:|-----------:|-------:|--------:|
|     Linq |     73.09 ns |   0.878 ns |   0.778 ns |   1.00 |    0.00 |
| ExprTree | 16,293.69 ns | 184.834 ns | 172.894 ns | 222.83 |    3.96 |

The benchmarks are generated using benchmark.net.

  1. Why is the expression tree approach significantly slower?
  2. Would it be fair to assume the expression tree solution is faster?
  3. Bonus: What effect does using the MakeGenericType solution have on performance in this context?

EDIT: I've refactored the code somewhat

    public class MyBenchMarks    
    {
        IEnumerable<DummyType> Dummies = DummyType.GenerateDummySet();
        IEnumerable<string> Properties = new string[] { "Prop1" };

        [Benchmark(Description = "Linq", Baseline = true)]
        public Object LinqSolution() => new Mappers().LinqSolution(Properties, Dummies);


        [Benchmark(Description = "ExprTree", Baseline = false)]
        public void ExprTreeSolution() => new Mappers().ExprTreeSolution(Properties, Dummies);
    }

    public class Mappers
    {
        List<Tuple<string, Func<DummyType, object>>> GetAccessors(IEnumerable<string> fields)
        {
            List<PropertyInfo> props = new List<PropertyInfo>(fields.Select(x => typeof(DummyType).GetProperty(x)).Where(x => x != null));
            var accessors = new List<Tuple<string, Func<DummyType, object>>>();

            foreach (var prop in props)
            {
                var instance = Expression.Parameter(typeof(DummyType));
                var call = Expression.Property(instance, prop);
                var expr = Expression.Lambda<Func<DummyType, object>>(call, instance).Compile();

                accessors.Add(Tuple.Create(prop.Name, expr));
            }

            return accessors;
        }

        public IEnumerable<KeyValuePair<string, object>> ExprTreeSolution(IEnumerable<string> fields, IEnumerable<DummyType> dummies)
        {
            List<KeyValuePair<string, object>> result = new List<KeyValuePair<string, object>>();
            var accessors = GetAccessors(fields);

            foreach (var dummy in dummies)
            {
                foreach (var accessor in accessors)
                {
                    var propResult = accessor.Item2(dummy);
                    result.Add(KeyValuePair.Create(accessor.Item1, propResult));
                }
            }

            return result;
        }

        public IEnumerable<KeyValuePair<string, object>> LinqSolution<T>(IEnumerable<String> fields, IEnumerable<T> dummies)
        {
            var result = dummies.Select(y =>
                  y.GetType().GetProperties()
                  .Where(x => fields.Contains(x.Name, StringComparer.OrdinalIgnoreCase))
                  .Select(x => KeyValuePair.Create(x.Name, x.GetValue(y))).ToList())
                  .SelectMany(x => x);

            return result;
        }
    }

    public class DummyType
    {
        public bool Prop0 { get; set; }
        public string Prop1 { get; set; }
        public int Prop2 { get; set; }

        public static List<DummyType> GenerateDummySet()
        {
            return Enumerable.Range(0, 100).Select(x =>
                 new DummyType
                 {
                     Prop0 = true,
                     Prop1 = "fooBar",
                     Prop2 = x
                 }).ToList();
        }
    }

The corresponding results:

BenchmarkDotNet = v0.12.1, OS = Windows 10.0.19041.630(2004 /?/ 20H1)
Intel Core i5-8600K CPU 3.60GHz (Coffee Lake), 1 CPU, 6 logical and 6 physical cores
.NET Core SDK=5.0.100  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT[AttachedDebugger]
DefaultJob : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

|   Method |         Mean |      Error |     StdDev |    Ratio | RatioSD |
|--------- |-------------:|-----------:|-----------:|---------:|--------:|
| Linq     | 66.14 ns     | 0.162 ns   | 0.143 ns   | 1.00     | 0.00    |
| ExprTree | 70,366.57 ns | 500.248 ns | 443.457 ns | 1,063.84 | 7.51    |

this code can be run from a console application using

BenchmarkRunner.Run(typeof(Program).Assembly);

Upvotes: 0

Views: 225

Answers (2)

StriplingWarrior
StriplingWarrior

Reputation: 156728

The main problem is that LinqSolution is returning a deferred LINQ IEnumerable<>. It's not actually doing the work (reflection). Try changing return result to return result.ToList(). That'll at least help ensure you're comparing apples to apples.

enter image description here

Beyond that, recognize that the act of compiling an expression is quite expensive. You probably won't see a big performance gain unless you reuse the compiled function many more times. To see this in action, try generating 10000 items in GenerateDummySet instead of just 100.

enter image description here

To take advantage of this in real-world code, try memoizing the compiled function (using a static Lazy<> initialization, for example).

Upvotes: 2

quetzalcoatl
quetzalcoatl

Reputation: 33566

Another guess, adding to my comments: I just noticed that both version A and version B seems to return the final values as read from the dummies' properties.

If you have measured the time of version A and version B exactly as they are shown here, then do note that you're measuring not only the time of accessing the data via one or the other way, but you are also measuring the time for setting up all things.

For example, in version A, you are probably saving some time using y.GetType().GetProperties() which gets all props in one go, while in version B you're doing something probably very wasteful by going over some "properties" list and separately looking up each property by name: var prop = typeof(DummyType).GetProperty(propName);

Furthermore, if you measured it as shown, then version B includes generating dynamic assembly (or assembles) at runtime at the point where you .Compile(); the expression. That might have costed a lot of time, that could add a lot to the statistics.

So... I think you have to seriously revise your question. It's missing a lot of important information, and all we can do is guessing.

Upvotes: 0

Related Questions