Gerald Hughes
Gerald Hughes

Reputation: 6159

Entity Framework - remove duplicates

I want to remove duplicate records using Entity Framework.

This is what I've tried

var result = _context.History
            .GroupBy(s => new
                    {
                        s.Date,
                        s.EventId
                    })
            .SelectMany(grp => grp.Skip(1)).ToList();

_context.History.RemoveRange(result);
await _context.SaveChangesAsync();

But I get an error

System.InvalidOperationException: Processing of the LINQ expression 'grp => grp.Skip(1)' by 'NavigationExpandingExpressionVisitor' failed. This may indicate either a bug or a limitation in EF Core

I understand that this is breaking change for Entity Framework, but I really don't know how to update my code.

Upvotes: 0

Views: 5628

Answers (2)

Milos Gak
Milos Gak

Reputation: 21

In this case you are grouping by both columns:

var duplicate = DB.History.GroupBy(x => new { x.Date, x.EventId})
                         .Where(x => x.Count() > 1)
                         .SelectMany(x => x.ToList());

Upvotes: 1

ochzhen
ochzhen

Reputation: 156

Looks like Entity Framework doesn't know how to translate this Skip part of LINQ query. Moreover, it cannot make translate this GroupBy part. In EF Core 3 it will throw an exception to let us know :)

So, a dirty but simple way is to add AsEnumerable almost at the beginning, however, it will fetch all table and perform operations in memory:

var result = _context.History
            .AsEnumerable()
            .GroupBy(s => new { s.Date, s.EventId })
            .SelectMany(g => g.Skip(1))
            .ToList();

_context.History.RemoveRange(result);
await _context.SaveChangesAsync();

Since in most cases it's not acceptable to fetch everything we can split first request into two so that we download only duplicated records.

Second answer of this question might help, we can try something like this:

var keys = _context.History
                .GroupBy(s => new { s.Date, s.EventId })
                .Select(g => new { g.Key, Count = g.Count() })
                .Where(t => t.Count > 1)
                .Select(t => new { t.Key.Date, t.Key.EventId })
                .ToList();

var result = _context.History
    .Where(h => keys.Any(k => k.Date == h.Date && k.EventId == h.EventId))
    .AsEnumerable()
    .GroupBy(s => new { s.Date, s.EventId })
    .SelectMany(g => g.Skip(1))
    .ToList();

_context.History.RemoveRange(result);
await _context.SaveChangesAsync();

Upvotes: 1

Related Questions