Aaron
Aaron

Reputation: 356

LINQ get column based on another column having count >1

I am struggling to get column A value by querying a column B for duplicates from database with Entity Framework Core. Example:

Column A | Column B
--------------------
A        | 1
B        | 1 
C        | 2 
D        | 2 
E        | 3 

I should get output:

A,B,C,D

Which I plan to use to populate a drop down list.

I’ve tried:

Options = _context.table.AsEnumerable()
.GroupBy(s => new{ s.columnA,s.columnB)
.Where(o => o.columnB.Count() > 1)
.Select(o => new SelectListItem
{
   Value = o.Key.columnA,
   Text = o.Key.columnB
}.ToList()

Upvotes: 4

Views: 876

Answers (2)

Karan
Karan

Reputation: 12629

You want to count on columnB only then you should be using .GroupBy(s => s.columnB). Update your Where & Select like below.

Note Need to use SelectMany wiht Select inside it because if we use Select instead of SelectMany it will return List<List<SelectListItem>>. SelectMany will flatten it and return List<SelectListItem>.

Options = _context.table.AsEnumerable()
            .GroupBy(s => s.columnB)
            .Where(o => o.Count() > 1)
            .SelectMany(o => o.Select(x => new SelectListItem
            {
               Value = x.columnA,
               Text = x.columnB
            }))
            .ToList();

Detail Explanation Credits to @Flater from helpful comment. The lambda in GroupBy is essentially the "group identifier". In the question, each group is defined by the combined uniqueness of columns A and B. But as per described into question it needs to combine things based on column B alone, with the goal to specifically group things regardless of what column A contains, so column A should not be included in the group's identifier, since the query should not make a new group when encountering a different column A value.

Upvotes: 3

Harald Coppoolse
Harald Coppoolse

Reputation: 30474

So you have combinations of [ColumnA, ColumnB], and if you want all values of ColumnA where there is more than one ColumnB.

My advice would be to convert your [ColumnA, ColumnB] into groups of "ColumnB, with all ColumnA that have this ColumnB. Then keep only the Groups that have more than one ColumnA.

So in your example:

  • 1 with values A, B
  • 2 with values C, D
  • 3 with value E

Throw away group 3 because it has only one element, and flatten the result.

We use the overload of GroupBy that has an elementSelector. The elementSelector will select columnA as elements.

var combinationsColumnAColumnB = ...
var result = combinationsColumnB.GroupBy(
     // parameter keySelector: make groups with same ColumnB
     combination => combination.ColumnB,

    // parameter elementselector: put only the ColumnA in each group
    combination => combination.ColumnA)

    // result: groups of columnB with all columnA that belong to this columnB
    // keep only those groups that have more than one element
    .Where(group => group.Skip(1).Any()

    // and flatten the result:
    .SelectMany(group => group);

If you have combinations like:

A   1
A   2
B   1
C   2

You will get element A twice. If you don't want that, add a Distinct()

Upvotes: 0

Related Questions