Reputation: 1121
I have a List
of Objects
(roughly 100k) that is must iterate upon in order to produce a Dictionary
.
however the code is performing very slowly, specifically on one line
public class Item{
public int ID;
public int Secondary_ID;
public string Text;
public int Number;
}
Data Looks something like (100k lines)
ID | Secondary_ID | Text | Number
1 | 1 | "something" | 3
1 | 1 | "something else"| 7
1 | 1 | "something1" | 4
1 | 2 | "something2" | 344
2 | 3 | "something3" | 74
2 | 3 | "something4" | 1
and i would like it to look like this when finished. (any collection will do to be honest)
Dictionary<int, string>
Key | Value
(secondary_ID) | (Text : Number)
1 | "Something : 3, Something else : 7, Something1 : 4"
2 | "Something2 : 344"
3 | "Something3 : 74, Something4 : 1"
My code currently works like this ListAll
contains all data.
var Final=new Dictionary<int, string>();
var id1s=ListAll.Select(x => x.ID).Distinct().ToList();
foreach(var id1 in id1s) {
var shortList=ListAll.Where(x => x.ID==id1).ToList(); //99% of time spent is here
var id2s=shortList.Select(x => x.Secondary_ID).Distinct().ToList();
foreach(var id2 in id2s) {
var s=new StringBuilder();
var items=shortList.Where(x => x.Secondary_ID==id2).ToList();
foreach(var i in items) {
s.Append(String.Format("{0} : {1}", i.Text, i.Number));
}
Final.Add(id2, s.ToString());
}
}
return Final;
now the output is correct however as stated in the above comment, this takes an incredibly long time to process (90 seconds - certainly more than i am comfortable with) and was wondering if there is a faster way of achieving this.
This code is only going to be used once so is not really a normal usage and normally I would ignore it for that reason, but was wondering for learning purposes.
Upvotes: 1
Views: 5889
Reputation: 4693
First, remove everywhere ToList()
, it should becomes faster; because ToList()
performs eager evaluation.
I think what your code expects to do is:
var Final=new Dictionary<int, string>();
foreach(var x in ListAll)
if(Final.ContainsKey(x.Secondary_ID))
Final[x.Secondary_ID]+=String.Format(", {0} : {1}", x.Text, x.Number);
else
Final.Add(x.Secondary_ID, String.Format("{0} : {1}", x.Text, x.Number));
return Final;
A Dictionary
cannot contain a duplicate key, so it's no matter here you use ID
or Secondary_ID
, if your Secondary_ID
must be in the range of existing ID
; and you even do not need Distinct()
in the code.
By doing some simplification, original code would be:
foreach(var id1 in ListAll.Select(x => x.ID).Distinct()) {
foreach(var id2 in ListAll.Where(x => x.ID==id1).Select(x => x.Secondary_ID).Distinct()) {
var s=new StringBuilder();
foreach(var i in ListAll.Where(x => x.ID==id1).Where(x => x.Secondary_ID==id2)) {
s.Append(String.Format("{0} : {1}", i.Text, i.Number));
}
Final.Add(id2, s.ToString());
}
}
Upvotes: 0
Reputation: 116448
Here's what I would do (untested, but hopefully you get the idea):
var final = ListAll.GroupBy(x => x.Secondary_ID)
.ToDictionary(x => x.Key, x => String.Join(", ",
x.Select(y => String.Format("{0} : {1}",
y.Text, y.Number)))
This first groups by Secondary_ID
using GroupBy
, then puts the result into a dictionary using ToDictionary
.
The GroupBy
will group your data into the following groups:
Key = 1: ID | Secondary_ID | Text | Number 1 | 1 | "something" | 3 1 | 1 | "something else"| 7 1 | 1 | "something1" | 4 Key = 2: ID | Secondary_ID | Text | Number 1 | 2 | "something2" | 344 Key = 3: ID | Secondary_ID | Text | Number 2 | 3 | "something3" | 74 2 | 3 | "something4" | 1
Then the .ToDictionary
method:
x.Key
(the key we grouped on, i.e. Secondary_ID
).String.Join
operation as the value. What is being joined is the collection of "Text : Number" from the elements inside that group - x.Select(y => String.Format("{0} : {1}", y.Text, y.Number)
.Upvotes: 8
Reputation: 203814
A much more efficient (and even easier to write) method of grouping the items by ID is to use GroupBy
.
var query = ListAll.GroupBy(x => x.Secondary_ID)
.ToDictionary(group => group.Key,
group => string.Join(", ",
group.Select(item => string.Format("{0} : {1}",item.Text , item.Number))),
//consider refactoring part of this line out to another method
});
As for the reason that your code is so slow, you're searching through the entire list for each distinct ID. That's an O(n^2) operation. GroupBy
doesn't do that. It uses a hash based structure internally, based on whatever you're grouping on, so that it can quickly (in O(1) time) find the bucket that any given item belongs in, as opposed to the O(n) time it takes your method.
Upvotes: 8