Reputation: 763
Considering the sample table
Col 1, Col2, Col3
1 , x , G
1 , y , H
2 , z , J
2 , a , K
2 , a , K
3 , b , E
I want below result, i.e distinct rows
1 , x , G
1 , y , H
2 , z , J
2 , a , K
3 , b , E
I tried
var Result = Context.Table.Select(C =>
new {
Col1 = C.Col1,
Col2 = C.Col2,
Col3 = C.Col3
}).Distinct();
and
Context.Table.GroupBy(x=>new {x.Col1,x.Col2,x.Col3}).Select(x=>x.First()).ToList();
The results are as expected, however my table has 35 columns and 1 million records and its size will keep on growing, the current time for the query is 22-30 secs, so how to improve the performance and get it down to 2-3 secs?
Upvotes: 2
Views: 1392
Reputation: 37299
Using distinct
is the way to go... I'd say that the first approach you tried is the correct one - but do you really need all 1 million rows? See what where
conditions you can add or maybe take just the first x records?
var Result = Context.Table.Select(c => new
{
Col1 = c.Col1,
Col2 = c.Col2,
Col3 = c.Col3
})
.Where(c => /*some condition to narrow results*/)
.Take(1000) //some number of the wanted amount of records
.Distinct();
What you might be able to do, is to use the rownum
to select in bulks. Something like:
public <return type> RetrieveBulk(int fromRow, int toRow)
{
return Context.Table.Where(record => record.Rownum >= fromRow && record.Rownum < toRow)
.Select(c => new
{
Col1 = c.Col1,
Col2 = c.Col2,
Col3 = c.Col3
}).Distinct();
}
This code you can then do something like:
List<Task<return type>> selectTasks = new List<Task<return type>>();
for(int i = 0; i < 1000000; i+=1000)
{
selectTasks.Add(Task.Run(() => RetrieveBulk(i, i + 1000)));
}
Task.WaitAll(selectTasks);
//And then intercet data using some efficient structure as a HashSet so when you intersect it wont be o(n)2 but o(n)
Upvotes: 1