Reputation: 14263
From my mentor: Prefer native methods (implemented directly on the collection) over extension methods of IEnumerable, because:
The LINQ-to-Objects extension methods are implemented on IEnumerable, meaning that in the worst-case scenario (when the item you search for does not exist in the collection) you will have to enumerate thru all elements. If you have a Contains or Exists method implemented directly on the collection, it could make use of internal knowledge and maybe just do a hash table look up or some other quick operation.
I was a deeply confused, because I think Microsoft should have implemented hash table for IEnumerable Contains/Exists already. A quick benchmark with List and IEnumerable show no differences:
static void Main(string[] args)
{
Console.Write("input the number of elements: ");
int count = Convert.ToInt32(Console.ReadLine());
Console.Write("input the number of loops: ");
int loop = Convert.ToInt32(Console.ReadLine());
Random r = new Random();
Stopwatch sw = new Stopwatch();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt(count);
sw.Start();
for (int j = 0; j < count; j++)
{
DoContains(list, r.Next());
}
sw.Stop();
}
Console.WriteLine("List<T> native method: Iterated {0} times on {1} elements, elapsed :{2}",loop,count,sw.Elapsed);
sw.Reset();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt(count);
sw.Start();
for (int j = 0; j < count; j++)
{
DoContainsEnumerable(list, r.Next());
}
sw.Stop();
}
Console.WriteLine("IEnumerable<T> extension method: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
sw.Reset();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt2(count);
sw.Start();
for (int j = 0; j < count; j++)
{
//make sure that the element is not in the list
DoContains(list, r.Next(20000, 50000));
}
sw.Stop();
}
Console.WriteLine("List<T> native method: element does not exist:Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
sw.Reset();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt2(count);
sw.Start();
for (int j = 0; j < count; j++)
{
//make sure that the element is not in the list
DoContainsEnumerable(list, r.Next(20000, 50000));
}
sw.Stop();
}
Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
Console.ReadKey();
}
static List<int> CreateListOfInt(int count)
{
Random r = new Random(1000);
List<int> numbers = new List<int>(count);
for (int i = 0; i < count; i++)
{
numbers.Add(r.Next());
}
return numbers;
}
static bool DoContains(List<int> list, int number)
{
return list.Contains(number);
}
static bool DoContainsEnumerable(IEnumerable<int> list, int number)
{
return list.Contains(number);
}
//define the scope of randomly created number, to make sure that lookup number will not in the List
static List<int> CreateListOfInt2(int count)
{
Random r = new Random(1000);
List<int> numbers = new List<int>(count);
for (int i = 0; i < count; i++)
{
numbers.Add(r.Next(0,10000));
}
return numbers;
}
}
Edit: I tried HashSet implementation, which greatly increases performance:
sw.Reset();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt2(count);
HashSet<int> hashtable = new HashSet<int>(list);
sw.Start();
for (int j = 0; j < count; j++)
{
//make sure that the element is not in the list
hashtable.Contains(r.Next(20000, 50000));
}
sw.Stop();
}
Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
Still, what is your opinion about my mentor saying?
Can anyone clear out for me? Is my mentor right? If he's right, what is wrong with my code?
Thank you very much
Upvotes: 5
Views: 588
Reputation: 71886
If the LINQ version has a quicker native implementation on the object then that quicker implementation is used instead.
For example, Count
is implemented like so:
if (source is Array)
return source.Length;
if (source is ICollection)
return source.Count;
// else iterate through all the items and count them.
Contains
like so:
if (source is ICollection)
return source.Contains(item);
// else iterate through the enumerable, and see if item exists
Since a HashSet<T>
implements ICollection<T>
the native Contains is used.
So, LINQ has been optimized for the standard interfaces. However, if you have a custom type that has a native call that isn't part of the default interface then the LINQ call may be slower.
Upvotes: 0
Reputation: 48596
List<T>
Contains
calls are just iterating the list, so they won't be faster than the extension method. If you were to use a HashSet<T>
and try a series of Contains()
operations, you would find a marked improvement.
Edit: the reason Microsoft didn't utilize a hash for the IEnumerable<T>
extension methods is they could not guarantee that the implementing class used a hash or something similar. They had to go with the naive approach because the IEnumerable<T>
interface only guarantees that the implementing class be enumerated.
Upvotes: 4