Reputation: 23229
I have the concept of a Session which stores objects in various states.
Sometimes I need to scan the Session for objects matching a particular query but I do this a lot and performance testing has shown it is becoming a bottleneck in some areas.
Therefore I would like to introduce the concept of indexes on a Session.
Something like...
public IDictionary<K, V> GetIndex<K, V>(Func<V, K> keySelector)
However I'm not sure about how to test "equality" of a Func like this. Obviously I want the index to only be built on the first call to GetIndex and subsequent calls to not build it again.
How should I be mapping these internally to do index existence lookups?
IDictionary<???, IDictionary<K, V>> indexes = ...
Basically how should I be storing the ???. Maybe I can't do this using a Func but perhaps there is some other way.
Upvotes: 8
Views: 2239
Reputation: 7342
Comparing expressions in loops could likely be more time consuming than selecting over the dictionary. As already pointed in the thread there are ways to compare them but very time consuming and not accurate:
x => x.Key == 1
vs
y => y.Key == 1
vs
int value = 1
x => x.Key == value
will give false
So creating indexes ad hoc is not a good solution.
What you could do is have an Indexing factory class with predefined expression templates that create expressions at first call to some parameter combination, and use them (instances) with .Equals by reference.
Something like (pseudoC#ode):
static class Indexfactory {
private static Dictionary<IndexcreationParams,Expression> ...
// more of these as required
public static Expression getIndex<Tret,P1,P2,P3,...>(IndexType type, P1 p1,P2 p2,P3 p3...) {
// create expression from template with the supplied parameters
// if not already existent, else rerturn it from static storage
// store expression in some private storage
}
}
Then store the expression as the key in the dictionary with a list of results when it gets executed first. On following executions check if you have cached results for this expression, since if you use the factory you will always get the same reference.
Upvotes: 1
Reputation: 15811
You could consider using an Expression<Func<K,V>>
and then Compile()
the expression when you need to execute.
For checking equality, take a look at this SO question:
How to check if two Expression<Func<T, bool>> are the same
Alternatively, you could give the indexes a name and continue to use the delegate:
public IDictionary<K, V> GetIndex<K, V>(string indexName, Func<V, K> keySelector)
IDictionary<string, IDictionary<K, V>> indexes = ..
Upvotes: 1
Reputation: 36082
The simplest approach is probably to compute a hash of the query, and insert the results into your dictionary using the hash as the key.
If your queries are strings, you can probably just use the string.GetHashCode function to compute a simple hash on the string data. If your queries are Linq queries, .GetHashCode probably won't work unless Linq specifically overrides this method to compute a hash over the expression tree instead of the default object instance pointer. The default implementation of .GetHashCode simply returns a value that is derived from the object instance identity in memory, with no consideration of the data content of the object.
If your queries are strings and are fairly uniform/consistent in construction, computing a simple string hash should be sufficient for reducing query traffic using the cache. If your queries are less consistent in structure (equivalent queries but with arguments in a different order, for example) you may need to build your own hash function that computes a hash on a canonicalized form of the input query to improve cache hit rates for queries that are logically equivalent but textually different.
As your hash computation grows more computationally expensive, it will diminish the performance gains of using a cache. Make sure the query operation is sufficiently expensive to justify spending time computing hashes and consuming memory for the cache to produce a net savings in execution time. The query operation should be at least 2 or more orders of magnitude greater than the hash calc and cache management overhead. If your query operation is an out of process or cross-network call, your cache overhead will almost certainly be dwarfed by the cost of the query.
Upvotes: 1