Rajeev Kumar
Rajeev Kumar

Reputation: 4963

iterating through IEnumerable<string> causing serious performance issue

I am clue less about what has happend to performance of for loop when i tried to iterate through IEnumerable type.

Following is the code that cause serious performance issue

foreach (IEdge ed in edcol)
{
    IEnumerable<string> row = 
        from r in dtRow.AsEnumerable()
        where (((r.Field<string>("F1") == ed.Vertex1.Name) && 
                (r.Field<string>("F2") == ed.Vertex2.Name))
            || ((r.Field<string>("F1") == ed.Vertex2.Name) &&
                (r.Field<string>("F2") == ed.Vertex1.Name)))
        select r.Field<string>("EdgeId");
    int co = row.Count();
    //foreach (string s in row)
    //{

    //}
    x++;
}

The upper foreach(IEdge ed in edcol) has about 11000 iteration to complete. It runs in fraction of seconds if i remove the line

int co = row.Count();

from the code.

The row.Count() have maximum value of 10 in all loops.

If i Uncomment the

//foreach (string s in row)
//{

//}

it goes for about 10 minutes to complete the execution of code.

Does IEnumerable type have such a serious performance issues.. ??

Upvotes: 0

Views: 497

Answers (2)

Jon Skeet
Jon Skeet

Reputation: 1502466

This answer is for the implicit question of "how do I make this much faster"? Apologies if that's not actually what you were after, but...

You can go through the rows once, grouping by the names. (I haven't done the ordering like Marc has - I'm just looking up twice when querying :)

var lookup = dtRow.AsEnumerable()
                  .ToLookup(r => new { F1 = r.Field<string>("F1"),
                                       F2 = r.Field<string>("F2") });

Then:

foreach (IEdge ed in edcol)
{
    // Need to check both ways round...
    var first = new { F1 = ed.Vertex1.Name, F2 = ed.Vertex2.Name };
    var second = new { F1 = ed.Vertex2.Name, F2 = ed.Vertex1.Name };
    var firstResult = lookup[first];
    var secondResult = lookup[second];

    // Due to the way Lookup works, this is quick - much quicker than
    // calling query.Count()
    var count = firstResult.Count() + secondResult.Count();

    var query = firstResult.Concat(secondResult);

    foreach (var row in query)
    {
        ...
    }
}

Upvotes: 6

Marc Gravell
Marc Gravell

Reputation: 1063599

At the moment you have O(N*M) performance, which could be probematic if both N and M are large. I would be inclined to pre-compute some of the DataTable info. For example, we could try:

var lookup = dtRows.AsEnumerable().ToLookup(
        row => string.Compare(row.Field<string>("F1"),row.Field<string>("F2"))<0
           ? Tuple.Create(row.Field<string>("F1"), row.Field<string>("F2"))
           : Tuple.Create(row.Field<string>("F2"), row.Field<string>("F1")),
        row => row.Field<string>("EdgeId"));

then we can iterate that:

foreach(IEdge ed in edCol)
{
    var name1 = string.Compare(ed.Vertex1.Name,ed.Vertex2.Name) < 0
           ? ed.Vertex1.Name : ed.Vertex2.Name;
    var name2 = string.Compare(ed.Vertex1.Name,ed.Vertex2.Name) < 0
           ? ed.Vertex2.Name : ed.Vertex1.Name;

    var matches = lookup[Tuple.Create(name1,name2)];
    // ...
}

(note I enforced ascending alphabetical pairs in there, for convenience)

Upvotes: 1

Related Questions