Reputation: 901
I have a table (orders for ex) which has Multiple Columns.
products categories subcategories
--------------------------------------
prod1 cat1 sub1
prod1 cat2 sub2
prod2 cat3 sub6
prod1 cat1 sub1
prod5 cat2 sub8
prod2 cat1 sub1
prod1 cat7 sub3
prod8 cat2 sub2
prod2 cat3 sub1
Now I can write three different queries to get distinct values
var prod = (from p in _context.orders select p.products).ToList().Distinct();
similarly I can write it for others.
Now I need to get the distinct values of each column in a single query for which the result needs to look like
products categories subcategories
--------------------------------------
prod1 cat1 sub1
prod2 cat2 sub2
prod5 cat3 sub6
prod8 cat7 sub8
sub3
My ClassType for unique fields looks like this
public class UniqueProductFields
{
public IEnumerable<string> Products { get; set; }
public IEnumerable<string> Categories { get; set; }
public IEnumerable<string> Subcategories { get; set; }
}
Not sure how to do this in an efficient manner so that I dont have to write three methods. The table is in the database (hence the need for optimization)
Thanks!
Upvotes: 1
Views: 1186
Reputation: 1330
As far as i know, you won't be able to do it in a single query. Before thinking how would you do it with C# think how would you do it in SQL; I might be wrong but to me you'll be writing 3 querys anyway.
If you notice some performance issues and this is your actual code:
var prod = (from p in _context.orders select p.products).ToList().Distinct();
You may want to start by removing the .ToList()
extension method beacuse that is retrieveng all records to memory and only after that the distinction is applied.
That's because your query expression (from p in ...
) returns an IQueryable
and calling .ToList()
on it makes it force the current formed SQL query to run and bring the results to memory.IEnumerable
.
The difference in this case is: Deferred execution
See: https://www.c-sharpcorner.com/UploadFile/rahul4_saxena/ienumerable-vs-iqueryable/
Upvotes: 0
Reputation: 50231
Is it an absolutely unchangeable requirement to use Linq? Why do you need it to be returned in a single query?
Suggestion: Use SQL. It can be done in a single query but you won't like the query. I'm assuming SQL Server (can be done differently for other DBMSes).
WITH V AS (
SELECT DISTINCT
V.*
FROM
Orders O
CROSS APPLY (
VALUES (1, O.Products), (2, O.Categories), (3, O.Subcategories)
) V (Which, Value)
),
Nums AS (
SELECT
Num = Row_Number() OVER (PARTITION BY V.Which ORDER BY V.Value),
V.Which,
V.Value
FROM
V
)
SELECT
Products = P.[1],
Categories = P.[2],
Subcategories = P.[3]
FROM
Nums N
PIVOT (Max(N.Value) FOR N.Which IN ([1], [2], [3])) P
;
Output:
Products Categories Subcategories
-------- ---------- -------------
prod1 cat1 sub1
prod2 cat2 sub2
prod5 cat3 sub3
prod8 cat7 sub6
null null sub8
If you are bound and determined to use Linq, well, I can't help you with the query-style syntax. I only know the C# code style syntax, but here's a stab at that. Unfortunately, I don't think this will do you any good, because I had to use some pretty funky stuff to make it work. It uses essentially the same technique as the SQL query above, only, there's no equivalent of PIVOT
in Linq and there's no real natural row object other than a custom class.
using System;
using System.Collections.Generic;
using System.Linq;
public class Program {
public static void Main() {
var data = new List<Order> {
new Order("prod1", "cat1", "sub1"),
new Order("prod1", "cat2", "sub2"),
new Order("prod2", "cat3", "sub6"),
new Order("prod1", "cat1", "sub1"),
new Order("prod5", "cat2", "sub8"),
new Order("prod2", "cat1", "sub1"),
new Order("prod1", "cat7", "sub3"),
new Order("prod8", "cat2", "sub2"),
new Order("prod2", "cat3", "sub1")
};
int max = 0;
var items = data
.SelectMany(o => new List<KeyValuePair<int, string>> {
new KeyValuePair<int, string>(1, o.Products),
new KeyValuePair<int, string>(2, o.Categories),
new KeyValuePair<int, string>(3, o.Subcategories)
})
.Distinct()
.GroupBy(d => d.Key)
.Select(g => {
var l = g.Select(d => d.Value).ToList();
max = Math.Max(max, l.Count);
return l;
})
.ToList();
Enumerable
.Range(0, max)
.Select(i => new {
p = items[0].ItemAtOrDefault(i, null),
c = items[1].ItemAtOrDefault(i, null),
s = items[2].ItemAtOrDefault(i, null)
})
.ToList()
.ForEach(row => Console.WriteLine($"p: {row.p}, c: {row.c}, s: {row.s}"));
}
}
public static class ListExtensions {
public static T ItemAtOrDefault<T>(this List<T> list, int index, T defaultValue)
=> index >= list.Count ? defaultValue : list[index];
}
public class Order {
public Order(string products, string categories, string subcategories) {
Products = products;
Categories = categories;
Subcategories = subcategories;
}
public string Products { get; set; }
public string Categories { get; set; }
public string Subcategories { get; set; }
}
I suppose that we could swap this
.Select(i => new {
p = items[0].ItemAtOrDefault(i, null),
c = items[1].ItemAtOrDefault(i, null),
s = items[2].ItemAtOrDefault(i, null)
})
for this:
.Select(i => new Order(
items[0].ItemAtOrDefault(i, null),
items[1].ItemAtOrDefault(i, null),
items[2].ItemAtOrDefault(i, null)
))
Then use that class's properties in the output section.
Upvotes: 5