Dejan Janjušević
Dejan Janjušević

Reputation: 3230

Unexpected generated SQL by EF Core 2.2 when joining on multiple columns

Consider the following classes:

public class Foo
{
    public int Id { get; set; }

    public string Type { get; set; }

    public int BarId { get; set; }
}

public class Bar
{
    public int Id { get; set; }

    public string Name { get; set; }
}

and the following DbContext:

public class TestDbContext : DbContext
{
    public DbSet<Foo> Foos { get; set; }

    public DbSet<Bar> Bars { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder.UseSqlServer(@"Server=.;Database=ConditionalJoinEFCoreTest;Trusted_Connection=True;");
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        base.OnModelCreating(modelBuilder);
        modelBuilder.Entity<Foo>().HasData(new Foo { Id = 1, BarId = 1, Type = "Bar" });
        modelBuilder.Entity<Foo>().HasData(new Foo { Id = 2, BarId = 2, Type = "Bar" });
        modelBuilder.Entity<Foo>().HasData(new Foo { Id = 3, BarId = 1, Type = "Not Bar" });
        modelBuilder.Entity<Foo>().HasData(new Foo { Id = 4, BarId = 2, Type = "Not Bar" });

        modelBuilder.Entity<Bar>().HasData(new Bar { Id = 1, Name = "Bar 1" });
        modelBuilder.Entity<Bar>().HasData(new Bar { Id = 2, Name = "Bar 2" });
    }
}

Now let's query the data:

using (var ctx = new TestDbContext())
{
    var joinResult = ctx.Foos.GroupJoin(
      ctx.Bars,
      foo => new { Key = foo.BarId, PropName = foo.Type },
      bar => new { Key = bar.Id, PropName = "Bar" },
      (foo, bars) => new
      {
         Foo = foo,
         Bars = bars
      })
  .SelectMany(
      x => x.Bars.DefaultIfEmpty(),
      (foo, bar) => new 
      {
          Foo = foo.Foo,
          Bar = bar.Name
      });

    var result = joinResult.GroupBy(x => x.Foo.Id).Select(x => new
    {
        Id = x.Key,
        Name = x.Max(r => r.Bar)
    }).ToList();
}

This query will, as expected, yield the following SQL:

SELECT [foo].[Id], [foo].[BarId], [foo].[Type], [bar].[Name] AS [Bar]
FROM [Foos] AS [foo]
LEFT JOIN [Bars] AS [bar] ON ([foo].[BarId] = [bar].[Id]) AND ([foo].[Type] = N'Bar')
ORDER BY [foo].[Id]

However if we define the type:

public class ConditionalJoin
{
    public int Key { get; set; }

    public string PropName { get; set; }
}

...and then modify the LINQ query:

using (var ctx = new TestDbContext())
{
    var joinResult = ctx.Foos.GroupJoin(
      ctx.Bars,
      foo => new ConditionalJoin { Key = foo.BarId, PropName = foo.Type }, // <-- changed
      bar => new ConditionalJoin { Key = bar.Id, PropName = "Bar" }, // <-- changed
      (foo, bars) => new
      {
         Foo = foo,
         Bars = bars
      })
  .SelectMany(
      x => x.Bars.DefaultIfEmpty(),
      (foo, bar) => new 
      {
          Foo = foo.Foo,
          Bar = bar.Name
      });

    var result = joinResult.GroupBy(x => x.Foo.Id).Select(x => new
    {
        Id = x.Key,
        Name = x.Max(r => r.Bar)
    }).ToList();
}

Then the produced SQL looks like:

SELECT [foo0].[Id], [foo0].[BarId], [foo0].[Type]
FROM [Foos] AS [foo0]

SELECT [bar0].[Id], [bar0].[Name]
FROM [Bars] AS [bar0]

Why does this happen?

Upvotes: 1

Views: 371

Answers (1)

Ivan Stoev
Ivan Stoev

Reputation: 205579

Update (EF Core 3.x+): The issue has been fixed.

Original:

As I was suspecting, the problem is not the anonymous vs concrete type, but the unique C# compiler feature which emits special Expression.New call rather than the normal for such syntax Expression.MemberInit, and this is done only for anonymous types. It's exactly the same problem as in Selection in GroupBy query with NHibernate with dynamic anonymous object, so is the solution - generate class constructor with parameters and generate NewExpression with mapping arguments to class members.

Here is the prove of concept with the static class in question:

public class ConditionalJoin
{
    public ConditionalJoin(int key, string property)
    {
        Key = key;
        Property = property;
    }
    public int Key { get; }
    public string Property { get; }
    public static Expression<Func<T, ConditionalJoin>> Select<T>(Expression<Func<T, int>> key, Expression<Func<T, string>> property)
    {
        var parameter = key.Parameters[0];
        var body = Expression.New(
            typeof(ConditionalJoin).GetConstructor(new[] { typeof(int), typeof(string) }),
            new[] { key.Body, Expression.Invoke(property, parameter) },
            new [] { typeof(ConditionalJoin).GetProperty("Key"), typeof(ConditionalJoin).GetProperty("Property") });
        return Expression.Lambda<Func<T, ConditionalJoin>>(body, parameter);
    }
}

and the usage:

var joinResult = ctx.Foos.GroupJoin(
    ctx.Bars,
    ConditionalJoin.Select<Foo>(foo => foo.BarId, foo => foo.Type),
    ConditionalJoin.Select<Bar>(bar => bar.Id, bar => "Bar"),
    // the rest...

Of course if you want the query to work correctly event if evaluated client side (e.g. LINQ to Objects), the class must correct implement GetHashCode and Equals.


With that being said, actually EF Core supports another simpler alternative solution - using Tuples (not ValueTuples - these are still not supported in expression trees) instead on anonymous/concrete types. So the following also works correctly in EF Core (lloks like they have special case for Tuple types):

var joinResult = ctx.Foos.GroupJoin(
    ctx.Bars,
    foo => new Tuple<int, string>(foo.BarId, foo.Type),
    bar => new Tuple<int, string>(bar.Id, "Bar"),
    // the rest...

Upvotes: 3

Related Questions