Sebastian Krogull
Sebastian Krogull

Reputation: 1528

Combine Expressions instead of using multiple queries in Entity Framework

I have following generic queryable (which may already have selections applied):

IQueryable<TEntity> queryable = DBSet<TEntity>.AsQueryable();

Then there is the Provider class that looks like this:

public class Provider<TEntity>
{
    public Expression<Func<TEntity, bool>> Condition { get; set; }

    [...]
}

The Condition could be defined per instance in the following fashion:

Condition = entity => entity.Id == 3;

Now I want to select all Provider instances which have a Condition that is met at least by one entity of the DBSet:

List<Provider> providers = [...];
var matchingProviders = providers.Where(provider => queryable.Any(provider.Condition))

The problem with this: I'm starting a query for each Provider instance in the list. I'd rather use a single query to achieve the same result. This topic is especially important because of questionable performance. How can I achieve the same results with a single query and improve performance using Linq statements or Expression Trees?

Upvotes: 12

Views: 10685

Answers (5)

Ivan Stoev
Ivan Stoev

Reputation: 205579

Here is another view of the problem that has nothing to do with expressions.

Since the main goal is to improve the performance, if the attempts to produce the result with single query don't help, we could try improving the speed by parallelizing the execution of the original multi query solution.

Since it's really a LINQ to Objects query (which internally executes multiple EF queries), theoretically it should be a simple matter of turning it into a PLINQ query by inserting AsParallel like this (non working):

var matchingProviders = providers
    .AsParallel()
    .Where(provider => queryable.Any(provider.Condition))
    .ToList();

However, it turns out that EF DbContext is not well suited for multi thread access, and the above simply generates runtime errors. So I had to resort to TPL using one of the Parallel.ForEach overloads that allows us to supply local state, which I used to allocate several DbContext instances during the execution.

The final working code looks like this:

var matchingProviders = new List<Provider<TEntity>>();
Parallel.ForEach(providers,
    () => new
    {
        context = new MyDbContext(),
        matchingProviders = new List<Provider<TEntity>>()
    },
    (provider, state, data) =>
    {
        if (data.context.Set<TEntity>().Any(provider.Condition))
            data.matchingProviders.Add(provider);
        return data;
    },
    data =>
    {
        data.context.Dispose();
        if (data.matchingProviders.Count > 0)
        {
            lock (matchingProviders)
                matchingProviders.AddRange(data.matchingProviders);
        }
    }
);

If you have a multi core CPU (which is normal nowadays) and a good database server, this should give you the improvement you are seeking for.

Upvotes: 1

Sebastian Krogull
Sebastian Krogull

Reputation: 1528

The approach I tried here was to create Conditions and nest them into one Expression. If one of the Conditions is met, we get the index of the Provider for it.

private static Expression NestedExpression(
    IEnumerable<Expression<Func<TEntity, bool>>> expressions, 
    int startIndex = 0)
{
    var range = expressions.ToList();
    range.RemoveRange(0, startIndex);

    if (range.Count == 0)
        return Expression.Constant(-1);

    return Expression.Condition(
        range[0].Body, 
        Expression.Constant(startIndex), 
        NestedExpression(expressions, ++startIndex));
}

Because the Expressions still may use different ParameterExpressions, we need an ExpressionVisitor to rewrite those:

private class PredicateRewriterVisitor : ExpressionVisitor
{
    private readonly ParameterExpression _parameterExpression;

    public PredicateRewriterVisitor(ParameterExpression parameterExpression)
    {
        _parameterExpression = parameterExpression;
    }

    protected override Expression VisitParameter(ParameterExpression node)
    {
        return _parameterExpression;
    }
}

For the rewrite we only need to call this method:

private static Expression<Func<T, bool>> Rewrite<T>(
    Expression<Func<T, bool>> exp, 
    ParameterExpression parameterExpression)
{
    var newExpression = new PredicateRewriterVisitor(parameterExpression).Visit(exp);
    return (Expression<Func<T, bool>>)newExpression;
}

The query itself and the selection of the Provider instances works like this:

var parameterExpression = Expression.Parameter(typeof(TEntity), "src");
var conditions = Providers.Select(provider => 
    Rewrite(provider.Condition, parameterExpression)
);

var nestedExpression = NestedExpression(conditions);
var lambda = Expression.Lambda<Func<TEntity, int>>(nestedExpression, parameterExpression);

var matchInfo = queryable.Select(lambda).Distinct();
var MatchingProviders = Providers.Where((provider, index) => matchInfo.Contains(index));

Note: Another option which isn't really fast as well

Upvotes: 1

Ivan Stoev
Ivan Stoev

Reputation: 205579

Here is another (crazy) idea that came in my mind. Please note that similar to my previous answer, it doesn't guarantee better performance (in fact it could be worse). It just presents a way to do what you are asking with a single SQL query.

Here we are going to create a query that returns a single string with length N consisting of '0' and '1' characters with '1' denoting a match (something like string bit array). The query will use my favorite group by constant technique to build dynamically something like this:

var matchInfo = queryable
    .GroupBy(e => 1)
    .Select(g =>
        (g.Max(Condition[0] ? "1" : "0")) +
        (g.Max(Condition[1] ? "1" : "0")) +
            ...
        (g.Max(Condition[N-1] ? "1" : "0")))
    .FirstOrDefault() ?? "";

And here is the code:

var group = Expression.Parameter(typeof(IGrouping<int, TEntity>), "g");

var concatArgs = providers.Select(provider => Expression.Call(
        typeof(Enumerable), "Max", new[] { typeof(TEntity), typeof(string) },
        group, Expression.Lambda(
            Expression.Condition(
                provider.Condition.Body, Expression.Constant("1"), Expression.Constant("0")),
            provider.Condition.Parameters)));

var concatCall = Expression.Call(
    typeof(string).GetMethod("Concat", new[] { typeof(string[]) }),
    Expression.NewArrayInit(typeof(string), concatArgs));

var selector = Expression.Lambda<Func<IGrouping<int, TEntity>, string>>(concatCall, group);

var matchInfo = queryable
    .GroupBy(e => 1)
    .Select(selector)
    .FirstOrDefault() ?? "";

var matchingProviders = matchInfo.Zip(providers,
    (match, provider) => match == '1' ? provider : null)
    .Where(provider => provider != null)
    .ToList();

Enjoy:)

P.S. In my opinion, this query will run with constant speed (regarding number and type of the conditions, i.e. can be considered O(N) in the best, worst and average cases, where N is the number of the records in the table) because the database has to perform always a full table scan. Still it will be interesting to know what's the actual performance, but most likely doing something like this just doesn't worth the efforts.

Update: Regarding the bounty and the updated requirement:

Find a fast query that only reads a record of the table once and ends the query if already all conditions are met

There is no standard SQL construct (not even speaking about LINQ query translation) that satisfies both conditions. The constructs that allow early end like EXISTS can be used for a single condition, thus when executed for multiple conditions will violate the first rule of reading the table record only once. While the constructs that use aggregates like in this answer satisfy the first rule, but in order to produce the aggregate value they have to read all the records, thus cannot exit earlier.

Shortly, there is no query that can satisfy both requirements. What about the fast part, it really depends of the size of the data and the number and type of the conditions, table indexes etc., so again there is simply no "best" general solution for all cases.

Upvotes: 4

Sebastian Krogull
Sebastian Krogull

Reputation: 1528

Based on this Post by @Ivan I created an expression that is slightly faster in some cases.

It uses Any instead of Max to get the desired results.

var group = Expression.Parameter(typeof(IGrouping<int, TEntity>), "g");

var anyMethod = typeof(Enumerable)
    .GetMethods()
    .First(m => m.Name == "Any" && m.GetParameters()
    .Count() == 2)
    .MakeGenericMethod(typeof(TEntity));

var concatArgs = Providers.Select(provider => 
    Expression.Call(anyMethod, group, 
    Expression.Lambda(provider.Condition.Body, provider.Condition.Parameters)));

var convertExpression = concatArgs.Select(concat =>
    Expression.Condition(concat, Expression.Constant("1"), Expression.Constant("0")));

var concatCall = Expression.Call(
    typeof(string).GetMethod("Concat", new[] { typeof(string[]) }),
    Expression.NewArrayInit(typeof(string), convertExpression));

var selector = Expression.Lambda<Func<IGrouping<int, TEntity>, string>>(concatCall, group);

var matchInfo = queryable
    .GroupBy(e => 1)
    .Select(selector)
    .First();

var MatchingProviders = matchInfo.Zip(Providers,
    (match, provider) => match == '1' ? provider : null)
    .Where(provider => provider != null)
    .ToList();

Upvotes: 3

Ivan Stoev
Ivan Stoev

Reputation: 205579

Interesting challenge. The only way I see is to build dynamically UNION ALL query like this:

SELECT TOP 1 0 FROM Table WHERE Condition[0]
UNION ALL
SELECT TOP 1 1 FROM Table WHERE Condition[1]
...
UNION ALL
SELECT TOP 1 N-1 FROM Table WHERE Condition[N-1]

and then use the returned numbers as index to get the matching providers.

Something like this:

var parameter = Expression.Parameter(typeof(TEntity), "e");
var indexQuery = providers
    .Select((provider, index) => queryable
        .Where(provider.Condition)
        .Take(1)
        .Select(Expression.Lambda<Func<TEntity, int>>(Expression.Constant(index), parameter)))
    .Aggregate(Queryable.Concat);

var indexes = indexQuery.ToList();
var matchingProviders = indexes.Select(index => providers[index]);

Note that I could have built the query without using Expression class by replacing the above Select with

.Select(_ => index)

but that would introduce unnecessary SQL query parameter for each index.

Upvotes: 5

Related Questions