RoughPlace
RoughPlace

Reputation: 1121

Slow Performance of Linq Where statement

I have a List of Objects (roughly 100k) that is must iterate upon in order to produce a Dictionary. however the code is performing very slowly, specifically on one line

public class Item{
        public int ID;
        public int Secondary_ID;
        public string Text;
        public int Number;
}

Data Looks something like (100k lines)

ID  | Secondary_ID |      Text       | Number
1   |    1         | "something"     | 3
1   |    1         | "something else"| 7
1   |    1         | "something1"    | 4
1   |    2         | "something2"    | 344
2   |    3         | "something3"    | 74
2   |    3         | "something4"    | 1

and i would like it to look like this when finished. (any collection will do to be honest)

 Dictionary<int, string> 

Key             | Value
(secondary_ID)  | (Text : Number)

1               | "Something : 3, Something else : 7, Something1 : 4"
2               | "Something2 : 344"
3               | "Something3 : 74, Something4 : 1"

My code currently works like this ListAll contains all data.

var Final=new Dictionary<int, string>();
var id1s=ListAll.Select(x => x.ID).Distinct().ToList();

foreach(var id1 in id1s) {
    var shortList=ListAll.Where(x => x.ID==id1).ToList(); //99% of time spent is here
    var id2s=shortList.Select(x => x.Secondary_ID).Distinct().ToList();

    foreach(var id2 in id2s) {
        var s=new StringBuilder();
        var items=shortList.Where(x => x.Secondary_ID==id2).ToList();

        foreach(var i in items) {
            s.Append(String.Format("{0} : {1}", i.Text, i.Number));
        }

        Final.Add(id2, s.ToString());
    }
}

return Final;

now the output is correct however as stated in the above comment, this takes an incredibly long time to process (90 seconds - certainly more than i am comfortable with) and was wondering if there is a faster way of achieving this.

This code is only going to be used once so is not really a normal usage and normally I would ignore it for that reason, but was wondering for learning purposes.

Upvotes: 1

Views: 5889

Answers (3)

Ken Kin
Ken Kin

Reputation: 4693

First, remove everywhere ToList(), it should becomes faster; because ToList() performs eager evaluation.

I think what your code expects to do is:

var Final=new Dictionary<int, string>();

foreach(var x in ListAll)
    if(Final.ContainsKey(x.Secondary_ID))
        Final[x.Secondary_ID]+=String.Format(", {0} : {1}", x.Text, x.Number);
    else
        Final.Add(x.Secondary_ID, String.Format("{0} : {1}", x.Text, x.Number));

return Final;

A Dictionary cannot contain a duplicate key, so it's no matter here you use ID or Secondary_ID, if your Secondary_ID must be in the range of existing ID; and you even do not need Distinct() in the code.

By doing some simplification, original code would be:

foreach(var id1 in ListAll.Select(x => x.ID).Distinct()) {
    foreach(var id2 in ListAll.Where(x => x.ID==id1).Select(x => x.Secondary_ID).Distinct()) {
        var s=new StringBuilder();

        foreach(var i in ListAll.Where(x => x.ID==id1).Where(x => x.Secondary_ID==id2)) {
            s.Append(String.Format("{0} : {1}", i.Text, i.Number));
        }

        Final.Add(id2, s.ToString());
    }
}

Upvotes: 0

lc.
lc.

Reputation: 116448

Here's what I would do (untested, but hopefully you get the idea):

var final = ListAll.GroupBy(x => x.Secondary_ID)
                   .ToDictionary(x => x.Key, x => String.Join(", ", 
                       x.Select(y => String.Format("{0} : {1}", 
                           y.Text, y.Number)))

This first groups by Secondary_ID using GroupBy, then puts the result into a dictionary using ToDictionary.

The GroupBy will group your data into the following groups:

Key = 1:

ID  | Secondary_ID |      Text       | Number
1   |    1         | "something"     | 3
1   |    1         | "something else"| 7
1   |    1         | "something1"    | 4

Key = 2:
ID  | Secondary_ID |      Text       | Number
1   |    2         | "something2"    | 344

Key = 3:
ID  | Secondary_ID |      Text       | Number
2   |    3         | "something3"    | 74
2   |    3         | "something4"    | 1

Then the .ToDictionary method:

  • Selects the key as x.Key (the key we grouped on, i.e. Secondary_ID).
  • Selects the result of a String.Join operation as the value. What is being joined is the collection of "Text : Number" from the elements inside that group - x.Select(y => String.Format("{0} : {1}", y.Text, y.Number).

Upvotes: 8

Servy
Servy

Reputation: 203814

A much more efficient (and even easier to write) method of grouping the items by ID is to use GroupBy.

var query = ListAll.GroupBy(x => x.Secondary_ID)
    .ToDictionary(group => group.Key,
        group => string.Join(", ",
             group.Select(item => string.Format("{0} : {1}",item.Text , item.Number))),
    //consider refactoring part of this line out to another method
    });

As for the reason that your code is so slow, you're searching through the entire list for each distinct ID. That's an O(n^2) operation. GroupBy doesn't do that. It uses a hash based structure internally, based on whatever you're grouping on, so that it can quickly (in O(1) time) find the bucket that any given item belongs in, as opposed to the O(n) time it takes your method.

Upvotes: 8

Related Questions