Navyseal
Navyseal

Reputation: 901

Selecting unique values of different columns using LINQ

I have a table (orders for ex) which has Multiple Columns.

products    categories  subcategories
--------------------------------------

prod1       cat1        sub1
prod1       cat2        sub2
prod2       cat3        sub6
prod1       cat1        sub1
prod5       cat2        sub8
prod2       cat1        sub1
prod1       cat7        sub3
prod8       cat2        sub2
prod2       cat3        sub1

Now I can write three different queries to get distinct values

var prod = (from p in _context.orders select p.products).ToList().Distinct();

similarly I can write it for others.

Now I need to get the distinct values of each column in a single query for which the result needs to look like

products    categories  subcategories
--------------------------------------

prod1       cat1        sub1
prod2       cat2        sub2
prod5       cat3        sub6
prod8       cat7        sub8
                        sub3

My ClassType for unique fields looks like this

public class UniqueProductFields
{
    public IEnumerable<string> Products { get; set; }
    public IEnumerable<string> Categories { get; set; }
    public IEnumerable<string> Subcategories { get; set; }
}   

Not sure how to do this in an efficient manner so that I dont have to write three methods. The table is in the database (hence the need for optimization)

Thanks!

Upvotes: 1

Views: 1186

Answers (2)

Aar&#243;nBC.
Aar&#243;nBC.

Reputation: 1330

As far as i know, you won't be able to do it in a single query. Before thinking how would you do it with C# think how would you do it in SQL; I might be wrong but to me you'll be writing 3 querys anyway.

If you notice some performance issues and this is your actual code:

var prod = (from p in _context.orders select p.products).ToList().Distinct();

You may want to start by removing the .ToList() extension method beacuse that is retrieveng all records to memory and only after that the distinction is applied.

That's because your query expression (from p in ...) returns an IQueryable and calling .ToList() on it makes it IEnumerable. force the current formed SQL query to run and bring the results to memory.

The difference in this case is: Deferred execution

See: https://www.c-sharpcorner.com/UploadFile/rahul4_saxena/ienumerable-vs-iqueryable/

Upvotes: 0

ErikE
ErikE

Reputation: 50231

Is it an absolutely unchangeable requirement to use Linq? Why do you need it to be returned in a single query?

Suggestion: Use SQL. It can be done in a single query but you won't like the query. I'm assuming SQL Server (can be done differently for other DBMSes).

WITH V AS (
   SELECT DISTINCT
      V.*
   FROM
      Orders O
      CROSS APPLY (
         VALUES (1, O.Products), (2, O.Categories), (3, O.Subcategories)
      ) V (Which, Value)
),
Nums AS (
   SELECT
      Num = Row_Number() OVER (PARTITION BY V.Which ORDER BY V.Value),
      V.Which,
      V.Value
   FROM
      V
)
SELECT
   Products = P.[1],
   Categories = P.[2],
   Subcategories = P.[3]
FROM
   Nums N
   PIVOT (Max(N.Value) FOR N.Which IN ([1], [2], [3])) P
;

See this working at db<>fiddle

Output:

Products  Categories  Subcategories
--------  ----------  -------------
prod1     cat1        sub1
prod2     cat2        sub2
prod5     cat3        sub3
prod8     cat7        sub6
null      null        sub8

If you are bound and determined to use Linq, well, I can't help you with the query-style syntax. I only know the C# code style syntax, but here's a stab at that. Unfortunately, I don't think this will do you any good, because I had to use some pretty funky stuff to make it work. It uses essentially the same technique as the SQL query above, only, there's no equivalent of PIVOT in Linq and there's no real natural row object other than a custom class.

using System;
using System.Collections.Generic;
using System.Linq;

public class Program {
    public static void Main() {
        var data = new List<Order> {
            new Order("prod1", "cat1", "sub1"),
            new Order("prod1", "cat2", "sub2"),
            new Order("prod2", "cat3", "sub6"),
            new Order("prod1", "cat1", "sub1"),
            new Order("prod5", "cat2", "sub8"),
            new Order("prod2", "cat1", "sub1"),
            new Order("prod1", "cat7", "sub3"),
            new Order("prod8", "cat2", "sub2"),
            new Order("prod2", "cat3", "sub1")
        };
        int max = 0;
        var items = data
            .SelectMany(o => new List<KeyValuePair<int, string>> {
                new KeyValuePair<int, string>(1, o.Products),
                new KeyValuePair<int, string>(2, o.Categories),
                new KeyValuePair<int, string>(3, o.Subcategories)
            })
            .Distinct()
            .GroupBy(d => d.Key)
            .Select(g => {
                var l = g.Select(d => d.Value).ToList();
                max = Math.Max(max, l.Count);
                return l;
            })
            .ToList();
        Enumerable
            .Range(0, max)
            .Select(i => new {
                p = items[0].ItemAtOrDefault(i, null),
                c = items[1].ItemAtOrDefault(i, null),
                s = items[2].ItemAtOrDefault(i, null)
            })
            .ToList()
            .ForEach(row => Console.WriteLine($"p: {row.p}, c: {row.c}, s: {row.s}"));
    }
}

public static class ListExtensions {
    public static T ItemAtOrDefault<T>(this List<T> list, int index, T defaultValue)
        => index >= list.Count ? defaultValue : list[index];
}

public class Order {
    public Order(string products, string categories, string subcategories) {
        Products = products;
        Categories = categories;
        Subcategories = subcategories;
    }
    public string Products { get; set; }
    public string Categories { get; set; }
    public string Subcategories { get; set; }
}

I suppose that we could swap this

.Select(i => new {
   p = items[0].ItemAtOrDefault(i, null),
   c = items[1].ItemAtOrDefault(i, null),
   s = items[2].ItemAtOrDefault(i, null)
})

for this:

.Select(i => new Order(
   items[0].ItemAtOrDefault(i, null),
   items[1].ItemAtOrDefault(i, null),
   items[2].ItemAtOrDefault(i, null)
))

Then use that class's properties in the output section.

Upvotes: 5

Related Questions