Reputation: 7719
How to efficiently cache methods compiled from an expression tree ?
public void SomeToStringCalls()
{
ToString(i => (i + 1).ToString(), 1);
ToString(i => (i + 1).ToString(), 2);
ToString(i => (i + 2).ToString(), 3);
ToString(i => (i + 2).ToString(), 4);
}
private string ToString<T>(Expression<Func<T, string>> expression, T input)
{
var method = expression.Compile();
return method.Invoke(input);
}
Above, every call is going to recompile each expression, even if some are identical. I can't have a Dictionary<Expression<Func<T, string>>, Func<T, string>>()
caching the compiled method from the expression because the equals
will fails.
Upvotes: 29
Views: 9570
Reputation: 12138
See also System.Web.Mvc.ExpressionUtil.CachedExpressionCompiler
which can be used to cache expression compilations. If using Mvc 5 call it through reflection, or otherwise you can bring a copy of the source code.
Upvotes: 0
Reputation: 5688
I found quite some time ago this article, which is exposing pros & cons of expression caching (with constant extraction... which allows you to compile .Where(t=>t.prop==3)
and .Where(t=>t.prop==5)
to the same delegate).
Upvotes: 5
Reputation: 25623
The problems with caching expression trees in a centralized cache are:
A comprehensive equality comparison will be expensive to perform, but the cost can be alleviated somewhat with a cheap hash function. Also, since expression trees are immutable, you can cache the hash code after you've calculated it for the first time. That may shave some time off lookups, but each time you check the cache using a newly created expression as the key (which, I would imagine, would be most of the time) you will at least need to hash the new expression.
An ideal solution would avoid all of this overhead. If it's feasible (i.e., if these expressions are not composed dynamically), your best bet is to simply cache the expression trees near their declaration sites. You should be able to store most (if not all) of these in static fields:
private static readonly Expression<Func<int, string>> _addOne =
i => (i + 1).ToString();
private static readonly Expression<Func<int, string>> _addTwo =
i => (i + 2).ToString();
public void SomeToStringCalls()
{
ToString(_addOne, 1);
ToString(_addOne, 2);
ToString(_addTwo, 3);
ToString(_addTwo, 4);
}
The downside is that you may end up with duplicate expressions across various types, but if you are not generating a very large number of expressions, this may not be an issue.
If that's not an option for you, you'll have to implement a centralized cache and the hashing and equality algorithms required to do so. The hashing algorithm will help you narrow down your candidate set, so it's important that it work reasonably well, i.e., produce very few collisions in practice. However, given the complex nature of expression trees, you want to keep the costs down. Therefore, I would advise you not aim for a perfect hashing function, but one that is reasonably cheap, yet effective. Perhaps something like this:
internal static class ExpressionHasher
{
private const int NullHashCode = 0x61E04917;
[ThreadStatic]
private static HashVisitor _visitor;
private static HashVisitor Visitor
{
get
{
if (_visitor == null)
_visitor = new HashVisitor();
return _visitor;
}
}
public static int GetHashCode(Expression e)
{
if (e == null)
return NullHashCode;
var visitor = Visitor;
visitor.Reset();
visitor.Visit(e);
return visitor.Hash;
}
private sealed class HashVisitor : ExpressionVisitor
{
private int _hash;
internal int Hash
{
get { return _hash; }
}
internal void Reset()
{
_hash = 0;
}
private void UpdateHash(int value)
{
_hash = (_hash * 397) ^ value;
}
private void UpdateHash(object component)
{
int componentHash;
if (component == null)
{
componentHash = NullHashCode;
}
else
{
var member = component as MemberInfo;
if (member != null)
{
componentHash = member.Name.GetHashCode();
var declaringType = member.DeclaringType;
if (declaringType != null && declaringType.AssemblyQualifiedName != null)
componentHash = (componentHash * 397) ^ declaringType.AssemblyQualifiedName.GetHashCode();
}
else
{
componentHash = component.GetHashCode();
}
}
_hash = (_hash * 397) ^ componentHash;
}
public override Expression Visit(Expression node)
{
UpdateHash((int)node.NodeType);
return base.Visit(node);
}
protected override Expression VisitConstant(ConstantExpression node)
{
UpdateHash(node.Value);
return base.VisitConstant(node);
}
protected override Expression VisitMember(MemberExpression node)
{
UpdateHash(node.Member);
return base.VisitMember(node);
}
protected override MemberAssignment VisitMemberAssignment(MemberAssignment node)
{
UpdateHash(node.Member);
return base.VisitMemberAssignment(node);
}
protected override MemberBinding VisitMemberBinding(MemberBinding node)
{
UpdateHash((int)node.BindingType);
UpdateHash(node.Member);
return base.VisitMemberBinding(node);
}
protected override MemberListBinding VisitMemberListBinding(MemberListBinding node)
{
UpdateHash((int)node.BindingType);
UpdateHash(node.Member);
return base.VisitMemberListBinding(node);
}
protected override MemberMemberBinding VisitMemberMemberBinding(MemberMemberBinding node)
{
UpdateHash((int)node.BindingType);
UpdateHash(node.Member);
return base.VisitMemberMemberBinding(node);
}
protected override Expression VisitMethodCall(MethodCallExpression node)
{
UpdateHash(node.Method);
return base.VisitMethodCall(node);
}
protected override Expression VisitNew(NewExpression node)
{
UpdateHash(node.Constructor);
return base.VisitNew(node);
}
protected override Expression VisitNewArray(NewArrayExpression node)
{
UpdateHash(node.Type);
return base.VisitNewArray(node);
}
protected override Expression VisitParameter(ParameterExpression node)
{
UpdateHash(node.Type);
return base.VisitParameter(node);
}
protected override Expression VisitTypeBinary(TypeBinaryExpression node)
{
UpdateHash(node.Type);
return base.VisitTypeBinary(node);
}
}
}
It's not perfect, but it should get you pretty good results:
NodeType
of every sub-expression is included in the hash. One obvious (but potentially costly) improvement would be to also include the Type
; try it if you find you are getting too many collisions.Since you're not actually overriding GetHashCode()
for any of the expression types, the imperfections of the hashing function will not leak out and affect external code. This gives us a degree of freedom in bending the rules of hash functions.
Your equality comparison will need to be more comprehensive. Whereas the hashing function is effectively an 'estimate' used to minimize your candidate set, the equality comparison performs the actual matching, and it needs to be perfect. You could certainly use my proposed hashing solution as a template for how to approach the problem, but bear in mind you must perform an exhaustive comparison of all an expression's attributes. One thing to bear in mind is that you probably don't want to compare the names of the ParameterExpression
nodes in the tree, but you'll want to maintain a mapping of the parameters/variables in the two trees you are comparing to make sure they represent the "same" value in the context of their parent expression tree.
With the exception of ignoring parameter/variable names, do not bother attempting to resolve "semantic equivalence", i.e., expressions which produce the same results and side effects but are not structurally identical. It cannot be done efficiently, and it's not worth it to try.
Lastly, you might be able to speed things up by implementing a two-level lookup: first, choose the correct cache based on the parameter type(s), then look for a match within that cache. This approach would be most effective if it is guaranteed that each lambda expression will have exactly one argument. The benefits would degrade with more arguments.
Upvotes: 19
Reputation:
Call me simplistic, but this seems about 4 times faster in one simple scenario I tested:
public static Dictionary<string, object> cache
= new Dictionary<string, object>();
public static string ToString<T>(
Expression<Func<T, string>> expression,
T input)
{
string key = typeof(T).FullName + ":" + expression.ToString();
object o; cache.TryGetValue(key, out o);
Func<T, string> method = (Func<T, string>)o;
if (method == null)
{
method = expression.Compile();
cache[key] = method;
}
return method.Invoke(input);
}
Upvotes: -1
Reputation: 2185
If your goal is to compile + invoke for "extract value" from expression, may be you look into another way.
I trying to extract value from expression tree without compilation through reflection.
My solution is not fully support all expressions, at first it was written for caching method invokation without lambda and arithmetic, but with some improvements it can help.
Here it is:
private static object ExtractValue(Expression expression, object[] input, ReadOnlyCollection<ParameterExpression> parameters)
{
if (expression == null)
{
return null;
}
var ce = expression as ConstantExpression;
if (ce != null)
{
return ce.Value;
}
var pe = expression as ParameterExpression;
if (pe != null)
{
return input[parameters.IndexOf(pe)];
}
var ma = expression as MemberExpression;
if (ma != null)
{
var se = ma.Expression;
object val = null;
if (se != null)
{
val = ExtractValue(se, input, parameters);
}
var fi = ma.Member as FieldInfo;
if (fi != null)
{
return fi.GetValue(val);
}
else
{
var pi = ma.Member as PropertyInfo;
if (pi != null)
{
return pi.GetValue(val);
}
}
}
var mce = expression as MethodCallExpression;
if (mce != null)
{
return mce.Method.Invoke(ExtractValue(mce.Object, input, parameters), mce.Arguments.Select(a => ExtractValue(a, input, parameters)).ToArray());
}
var sbe = expression as BinaryExpression;
if (sbe != null)
{
var left = ExtractValue(sbe.Left, input, parameters);
var right = ExtractValue(sbe.Right, input, parameters);
// TODO: check for other types and operands
if (sbe.NodeType == ExpressionType.Add)
{
if (left is int && right is int)
{
return (int) left + (int) right;
}
}
throw new NotImplementedException();
}
var le = expression as LambdaExpression;
if (le != null)
{
return ExtractValue(le.Body, input, le.Parameters);
}
// TODO: Check for other expression types
var dynamicInvoke = Expression.Lambda(expression).Compile().DynamicInvoke();
return dynamicInvoke;
}
With usage:
private static string ToString<T>(Expression<Func<T, string>> expression, T input)
{
var sw = Stopwatch.StartNew();
var method = expression.Compile();
var invoke = method.Invoke(input);
sw.Stop();
Console.WriteLine("Compile + Invoke: {0}, {1} ms", invoke, sw.Elapsed.TotalMilliseconds);
sw.Restart();
var r2 = ExtractValue(expression, new object[] {input}, null);
sw.Stop();
Console.WriteLine("ExtractValue: {0}, {1} ms", r2, sw.Elapsed.TotalMilliseconds);
return invoke;
}
I think, with some improvements and additional expression types this solution could be faster alternative for Compile().DynamicInvoke()
Upvotes: 0
Reputation: 27105
The problem you describe is severe in the sense that evaluating two expressions for semantic equality is at least as expensive as compiling the expression. To illustrate this, here is a link to an implementation for expression equality. This implementation is not perfect, for example:
MethodA() { MethodB(); }
MethodB() { ... }
In the example above, MethodA
and MethodB
are equivalent in the sense that they do the same thing, and you most likely want to consider them as equivalent. For example, building this in C# with compiler optimization enabled will replace the MethodB
call with MethodA
calls. There are a myriad of issues when comparing code and it is a topic of ongoing research.
You should consider a design where expressions are referenced by some kind of key that identifies it if you find out that compiling the expressions is the bottleneck in your application. By the time you have determined equality you could have already compiled it.
To comment on J0HN's answer, it compares the hash code of the body and the parameters, this is by no means a reliable solution, since it does not do a deep evaluation of the expression tree.
Also, take a look at this question as posted in the comments.
Upvotes: 1
Reputation: 26941
The reason why you can't use Dictionary<Expression<Func<T, string>>, Func<T, string>>
is that Expression<T>
GetHashCode
is not smart enough to detect "equal" expressions. I'm not sure, but it's quite likely that Expression<T>.GetHashCode
returns memory address of expression.
To address this issue you could introduce a more "smart" hash calculation. Let's consider equal expressions that have equal bodies. It's quite slippery path, but if you are willing to take responsibility on ensuring that:
you could achieve what you want.
Here's simple proof of concept code I've assembled for you at pastebin. It's not industrial strength (some hints in comments to improve it), however it clearly demonstrates the feasibility of approach.
A couple of things to consider before elaborating further: improper hash function might result in quite tricky bugs. So, think twice, write a lot of unit tests and prey :)
Upvotes: 2