Deltax76
Deltax76

Reputation: 14263

Extension methods on IEnumerable<T>: how is it performance?

From my mentor: Prefer native methods (implemented directly on the collection) over extension methods of IEnumerable, because:

The LINQ-to-Objects extension methods are implemented on IEnumerable, meaning that in the worst-case scenario (when the item you search for does not exist in the collection) you will have to enumerate thru all elements. If you have a Contains or Exists method implemented directly on the collection, it could make use of internal knowledge and maybe just do a hash table look up or some other quick operation.

I was a deeply confused, because I think Microsoft should have implemented hash table for IEnumerable Contains/Exists already. A quick benchmark with List and IEnumerable show no differences:

static void Main(string[] args)
{
    Console.Write("input the number of elements: ");
    int count = Convert.ToInt32(Console.ReadLine());
    Console.Write("input the number of loops: ");
    int loop = Convert.ToInt32(Console.ReadLine());

    Random r = new Random();

    Stopwatch sw = new Stopwatch();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            DoContains(list, r.Next());
        }
        sw.Stop();
    }

    Console.WriteLine("List<T> native method: Iterated {0} times on {1} elements, elapsed :{2}",loop,count,sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            DoContainsEnumerable(list, r.Next());
        }
        sw.Stop();
    }

    Console.WriteLine("IEnumerable<T> extension method: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt2(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            //make sure that the element is not in the list
            DoContains(list, r.Next(20000, 50000));
        }
        sw.Stop();
    }
    Console.WriteLine("List<T> native method: element does not exist:Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt2(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            //make sure that the element is not in the list
            DoContainsEnumerable(list, r.Next(20000, 50000));
        }
        sw.Stop();
    }
    Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);


    Console.ReadKey();
}

static List<int> CreateListOfInt(int count)
{
    Random r = new Random(1000);
    List<int> numbers = new List<int>(count);
    for (int i = 0; i < count; i++)
    {
        numbers.Add(r.Next());
    }
    return numbers;
}

static bool DoContains(List<int> list, int number)
{
    return list.Contains(number);
}

static bool DoContainsEnumerable(IEnumerable<int> list, int number)
{
    return list.Contains(number);
}


//define the scope of randomly created number, to make sure that lookup number will not in the List
static List<int> CreateListOfInt2(int count)
{
    Random r = new Random(1000);
    List<int> numbers = new List<int>(count);
    for (int i = 0; i < count; i++)
    {
        numbers.Add(r.Next(0,10000));
    }
    return numbers;
}

}

Edit: I tried HashSet implementation, which greatly increases performance:

  sw.Reset();
            for (int i = 0; i < loop; i++)
            {
                var list = CreateListOfInt2(count);
                HashSet<int> hashtable = new HashSet<int>(list);
                sw.Start();
                for (int j = 0; j < count; j++)
                {
                    //make sure that the element is not in the list
                    hashtable.Contains(r.Next(20000, 50000));
                }
                sw.Stop();
            }
            Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

Still, what is your opinion about my mentor saying?

Can anyone clear out for me? Is my mentor right? If he's right, what is wrong with my code?

Thank you very much

Upvotes: 5

Views: 588

Answers (2)

Cameron MacFarland
Cameron MacFarland

Reputation: 71886

If the LINQ version has a quicker native implementation on the object then that quicker implementation is used instead.

For example, Count is implemented like so:

if (source is Array)
    return source.Length;
if (source is ICollection)
    return source.Count;
// else iterate through all the items and count them.

Contains like so:

if (source is ICollection)
    return source.Contains(item);
// else iterate through the enumerable, and see if item exists

Since a HashSet<T> implements ICollection<T> the native Contains is used.

So, LINQ has been optimized for the standard interfaces. However, if you have a custom type that has a native call that isn't part of the default interface then the LINQ call may be slower.

Upvotes: 0

dlev
dlev

Reputation: 48596

List<T> Contains calls are just iterating the list, so they won't be faster than the extension method. If you were to use a HashSet<T> and try a series of Contains() operations, you would find a marked improvement.

Edit: the reason Microsoft didn't utilize a hash for the IEnumerable<T> extension methods is they could not guarantee that the implementing class used a hash or something similar. They had to go with the naive approach because the IEnumerable<T> interface only guarantees that the implementing class be enumerated.

Upvotes: 4

Related Questions