Toto
Toto

Reputation: 7719

Caching Compiled Expression tree

How to efficiently cache methods compiled from an expression tree ?

public void SomeToStringCalls()
{
    ToString(i => (i + 1).ToString(), 1);
    ToString(i => (i + 1).ToString(), 2);
    ToString(i => (i + 2).ToString(), 3);
    ToString(i => (i + 2).ToString(), 4);
}

private string ToString<T>(Expression<Func<T, string>> expression, T input)
{
    var method = expression.Compile();
    return method.Invoke(input);
}

Above, every call is going to recompile each expression, even if some are identical. I can't have a Dictionary<Expression<Func<T, string>>, Func<T, string>>() caching the compiled method from the expression because the equals will fails.

Upvotes: 29

Views: 9570

Answers (7)

Bouke
Bouke

Reputation: 12138

See also System.Web.Mvc.ExpressionUtil.CachedExpressionCompiler which can be used to cache expression compilations. If using Mvc 5 call it through reflection, or otherwise you can bring a copy of the source code.

Upvotes: 0

Olivier
Olivier

Reputation: 5688

I found quite some time ago this article, which is exposing pros & cons of expression caching (with constant extraction... which allows you to compile .Where(t=>t.prop==3) and .Where(t=>t.prop==5) to the same delegate).

Upvotes: 5

Mike Strobel
Mike Strobel

Reputation: 25623

The problems with caching expression trees in a centralized cache are:

  1. You will need comprehensive equality comparison and hashing algorithms.
  2. You will need to implement these algorithms yourself, as the standard expression types do not provide them out of the box.

A comprehensive equality comparison will be expensive to perform, but the cost can be alleviated somewhat with a cheap hash function. Also, since expression trees are immutable, you can cache the hash code after you've calculated it for the first time. That may shave some time off lookups, but each time you check the cache using a newly created expression as the key (which, I would imagine, would be most of the time) you will at least need to hash the new expression.

Option 1: Local/Static Caching

An ideal solution would avoid all of this overhead. If it's feasible (i.e., if these expressions are not composed dynamically), your best bet is to simply cache the expression trees near their declaration sites. You should be able to store most (if not all) of these in static fields:

private static readonly Expression<Func<int, string>> _addOne =
    i => (i + 1).ToString();
private static readonly Expression<Func<int, string>> _addTwo =
    i => (i + 2).ToString();

public void SomeToStringCalls()
{
    ToString(_addOne, 1);
    ToString(_addOne, 2);
    ToString(_addTwo, 3);
    ToString(_addTwo, 4);
}

The downside is that you may end up with duplicate expressions across various types, but if you are not generating a very large number of expressions, this may not be an issue.

Option 2: Centralized Caching

If that's not an option for you, you'll have to implement a centralized cache and the hashing and equality algorithms required to do so. The hashing algorithm will help you narrow down your candidate set, so it's important that it work reasonably well, i.e., produce very few collisions in practice. However, given the complex nature of expression trees, you want to keep the costs down. Therefore, I would advise you not aim for a perfect hashing function, but one that is reasonably cheap, yet effective. Perhaps something like this:

internal static class ExpressionHasher
{
    private const int NullHashCode = 0x61E04917;

    [ThreadStatic]
    private static HashVisitor _visitor;

    private static HashVisitor Visitor
    {
        get
        {
            if (_visitor == null)
                _visitor = new HashVisitor();
            return _visitor;
        }
    }

    public static int GetHashCode(Expression e)
    {
        if (e == null)
            return NullHashCode;

        var visitor = Visitor;

        visitor.Reset();
        visitor.Visit(e);

        return visitor.Hash;
    }

    private sealed class HashVisitor : ExpressionVisitor
    {
        private int _hash;

        internal int Hash
        {
            get { return _hash; }
        }

        internal void Reset()
        {
            _hash = 0;
        }

        private void UpdateHash(int value)
        {
            _hash = (_hash * 397) ^ value;
        }

        private void UpdateHash(object component)
        {
            int componentHash;

            if (component == null)
            {
                componentHash = NullHashCode;
            }
            else
            {
                var member = component as MemberInfo;
                if (member != null)
                {
                    componentHash = member.Name.GetHashCode();

                    var declaringType = member.DeclaringType;
                    if (declaringType != null && declaringType.AssemblyQualifiedName != null)
                        componentHash = (componentHash * 397) ^ declaringType.AssemblyQualifiedName.GetHashCode();
                }
                else
                {
                    componentHash = component.GetHashCode();
                }
            }

            _hash = (_hash * 397) ^ componentHash;
        }

        public override Expression Visit(Expression node)
        {
            UpdateHash((int)node.NodeType);
            return base.Visit(node);
        }

        protected override Expression VisitConstant(ConstantExpression node)
        {
            UpdateHash(node.Value);
            return base.VisitConstant(node);
        }

        protected override Expression VisitMember(MemberExpression node)
        {
            UpdateHash(node.Member);
            return base.VisitMember(node);
        }

        protected override MemberAssignment VisitMemberAssignment(MemberAssignment node)
        {
            UpdateHash(node.Member);
            return base.VisitMemberAssignment(node);
        }

        protected override MemberBinding VisitMemberBinding(MemberBinding node)
        {
            UpdateHash((int)node.BindingType);
            UpdateHash(node.Member);
            return base.VisitMemberBinding(node);
        }

        protected override MemberListBinding VisitMemberListBinding(MemberListBinding node)
        {
            UpdateHash((int)node.BindingType);
            UpdateHash(node.Member);
            return base.VisitMemberListBinding(node);
        }

        protected override MemberMemberBinding VisitMemberMemberBinding(MemberMemberBinding node)
        {
            UpdateHash((int)node.BindingType);
            UpdateHash(node.Member);
            return base.VisitMemberMemberBinding(node);
        }

        protected override Expression VisitMethodCall(MethodCallExpression node)
        {
            UpdateHash(node.Method);
            return base.VisitMethodCall(node);
        }

        protected override Expression VisitNew(NewExpression node)
        {
            UpdateHash(node.Constructor);
            return base.VisitNew(node);
        }

        protected override Expression VisitNewArray(NewArrayExpression node)
        {
            UpdateHash(node.Type);
            return base.VisitNewArray(node);
        }

        protected override Expression VisitParameter(ParameterExpression node)
        {
            UpdateHash(node.Type);
            return base.VisitParameter(node);
        }

        protected override Expression VisitTypeBinary(TypeBinaryExpression node)
        {
            UpdateHash(node.Type);
            return base.VisitTypeBinary(node);
        }
    }
}

It's not perfect, but it should get you pretty good results:

  1. It drills down and includes every expression in the tree.
  2. At minimum, the NodeType of every sub-expression is included in the hash. One obvious (but potentially costly) improvement would be to also include the Type; try it if you find you are getting too many collisions.
  3. Members and types referenced in the expression are included.
  4. Constant values appearing in the tree are included.
  5. It doesn't require a heap allocation to run, at the expense of being non-reentrant (since you're only analyzing a top-level expression, this is fine). You can run it concurrently on multiple threads.

Since you're not actually overriding GetHashCode() for any of the expression types, the imperfections of the hashing function will not leak out and affect external code. This gives us a degree of freedom in bending the rules of hash functions.

Your equality comparison will need to be more comprehensive. Whereas the hashing function is effectively an 'estimate' used to minimize your candidate set, the equality comparison performs the actual matching, and it needs to be perfect. You could certainly use my proposed hashing solution as a template for how to approach the problem, but bear in mind you must perform an exhaustive comparison of all an expression's attributes. One thing to bear in mind is that you probably don't want to compare the names of the ParameterExpression nodes in the tree, but you'll want to maintain a mapping of the parameters/variables in the two trees you are comparing to make sure they represent the "same" value in the context of their parent expression tree.

With the exception of ignoring parameter/variable names, do not bother attempting to resolve "semantic equivalence", i.e., expressions which produce the same results and side effects but are not structurally identical. It cannot be done efficiently, and it's not worth it to try.

Lastly, you might be able to speed things up by implementing a two-level lookup: first, choose the correct cache based on the parameter type(s), then look for a match within that cache. This approach would be most effective if it is guaranteed that each lambda expression will have exactly one argument. The benefits would degrade with more arguments.

Upvotes: 19

user645280
user645280

Reputation:

Call me simplistic, but this seems about 4 times faster in one simple scenario I tested:

    public static Dictionary<string, object> cache
        = new Dictionary<string, object>();
    public static string ToString<T>(
            Expression<Func<T, string>> expression,
            T input)
    {
        string key = typeof(T).FullName + ":" + expression.ToString();
        object o;  cache.TryGetValue(key, out o);
        Func<T, string> method = (Func<T, string>)o;
        if (method == null)
        {
            method = expression.Compile();
            cache[key] = method;
        }
        return method.Invoke(input);
    }

Upvotes: -1

pil0t
pil0t

Reputation: 2185

If your goal is to compile + invoke for "extract value" from expression, may be you look into another way.

I trying to extract value from expression tree without compilation through reflection.

My solution is not fully support all expressions, at first it was written for caching method invokation without lambda and arithmetic, but with some improvements it can help.

Here it is:

private static object ExtractValue(Expression expression, object[] input, ReadOnlyCollection<ParameterExpression> parameters)
{
    if (expression == null)
    {
        return null;
    }

    var ce = expression as ConstantExpression;
    if (ce != null)
    {
        return ce.Value;
    }

    var pe = expression as ParameterExpression;
    if (pe != null)
    {
        return input[parameters.IndexOf(pe)];
    }

    var ma = expression as MemberExpression;
    if (ma != null)
    {
        var se = ma.Expression;
        object val = null;
        if (se != null)
        {
            val = ExtractValue(se, input, parameters);
        }

        var fi = ma.Member as FieldInfo;
        if (fi != null)
        {
            return fi.GetValue(val);
        }
        else
        {
            var pi = ma.Member as PropertyInfo;
            if (pi != null)
            {
                return pi.GetValue(val);
            }
        }
    }

    var mce = expression as MethodCallExpression;
    if (mce != null)
    {
        return mce.Method.Invoke(ExtractValue(mce.Object, input, parameters), mce.Arguments.Select(a => ExtractValue(a, input, parameters)).ToArray());
    }

    var sbe = expression as BinaryExpression;
    if (sbe != null)
    {
        var left = ExtractValue(sbe.Left, input, parameters);
        var right = ExtractValue(sbe.Right, input, parameters);

        // TODO: check for other types and operands

        if (sbe.NodeType == ExpressionType.Add)
        {
            if (left is int && right is int)
            {
                return (int) left + (int) right;
            }
        }

        throw new NotImplementedException();
    }

    var le = expression as LambdaExpression;
    if (le != null)
    {
        return ExtractValue(le.Body, input, le.Parameters);
    }

    // TODO: Check for other expression types

    var dynamicInvoke = Expression.Lambda(expression).Compile().DynamicInvoke();
    return dynamicInvoke;
}

With usage:

private static string ToString<T>(Expression<Func<T, string>> expression, T input)
{
    var sw = Stopwatch.StartNew();
    var method = expression.Compile();
    var invoke = method.Invoke(input);
    sw.Stop();
    Console.WriteLine("Compile + Invoke: {0}, {1} ms", invoke, sw.Elapsed.TotalMilliseconds);
    sw.Restart();
    var r2 = ExtractValue(expression, new object[] {input}, null);
    sw.Stop();
    Console.WriteLine("ExtractValue: {0}, {1} ms", r2, sw.Elapsed.TotalMilliseconds);
    return invoke;
}

I think, with some improvements and additional expression types this solution could be faster alternative for Compile().DynamicInvoke()

Upvotes: 0

Bas
Bas

Reputation: 27105

The problem you describe is severe in the sense that evaluating two expressions for semantic equality is at least as expensive as compiling the expression. To illustrate this, here is a link to an implementation for expression equality. This implementation is not perfect, for example:

MethodA() { MethodB(); }
MethodB() { ... }

In the example above, MethodA and MethodB are equivalent in the sense that they do the same thing, and you most likely want to consider them as equivalent. For example, building this in C# with compiler optimization enabled will replace the MethodB call with MethodA calls. There are a myriad of issues when comparing code and it is a topic of ongoing research.

You should consider a design where expressions are referenced by some kind of key that identifies it if you find out that compiling the expressions is the bottleneck in your application. By the time you have determined equality you could have already compiled it.

To comment on J0HN's answer, it compares the hash code of the body and the parameters, this is by no means a reliable solution, since it does not do a deep evaluation of the expression tree.

Also, take a look at this question as posted in the comments.

Upvotes: 1

J0HN
J0HN

Reputation: 26941

The reason why you can't use Dictionary<Expression<Func<T, string>>, Func<T, string>> is that Expression<T> GetHashCode is not smart enough to detect "equal" expressions. I'm not sure, but it's quite likely that Expression<T>.GetHashCode returns memory address of expression.

To address this issue you could introduce a more "smart" hash calculation. Let's consider equal expressions that have equal bodies. It's quite slippery path, but if you are willing to take responsibility on ensuring that:

  1. No two different expressions have the same hash code
  2. Two expressions with same body have the same hash code

you could achieve what you want.

Here's simple proof of concept code I've assembled for you at pastebin. It's not industrial strength (some hints in comments to improve it), however it clearly demonstrates the feasibility of approach.

A couple of things to consider before elaborating further: improper hash function might result in quite tricky bugs. So, think twice, write a lot of unit tests and prey :)

Upvotes: 2

Related Questions