Zoyeb Shaikh
Zoyeb Shaikh

Reputation: 334

how to improve foreach loop performance in C#

class Student
{
    public int ID { get; set; }
    public string Name { get; set; }
    public string Email { get; set; }
}

class Program
{
    private static object _lockObj = new object();

    static void Main(string[] args)
    {

        List<int> collection = Enumerable.Range(1, 100000).ToList();


         
        List<Student> students = new List<Student>(100000);
        var options = new ParallelOptions()
        {
            MaxDegreeOfParallelism = 1
        };
        var sp = System.Diagnostics.Stopwatch.StartNew();
        sp.Start();
        Parallel.ForEach(collection, options, action =>
        {
            lock (_lockObj)
            {
                 var dt = collection.FirstOrDefault(x => x == action);
                 if (dt > 0)
                 {
                    Student student = new Student();
                    student.ID = dt;
                    student.Name = "Zoyeb";
                    student.Email = "[email protected]";

                    students.Add(student);
                    Console.WriteLine(@"value of i = {0}, thread = {1}", 
                    action,Thread.CurrentThread.ManagedThreadId);
                }
            }
        });
        sp.Stop();

        double data = Convert.ToDouble(sp.ElapsedMilliseconds / 1000);
        Console.WriteLine(data);
         
    }
}

I want to loop through 100000 records as quickly as possible i tried foreach loop but it is not quite good for loop through 100000 records then after i tried to implement Parallel.ForEach() that improved my performance , in real scenario i will have collection of Ids and i need to lookup into collection whether id exits or not if exits then add. performance is hitting in condition when i comment condition it took around 3 seconds to execute and when i uncomment condition it took around 24 seconds so my question is there any way i can boost my performance by looking up id in collection

         //var dt = collection.FirstOrDefault(x => x == action);
         //if (dt > 0)
         //{
            Student student = new Student();
            student.ID = 1;
            student.Name = "Zoyeb";
            student.Email = "[email protected]";

            students.Add(student);
            Console.WriteLine(@"value of i = {0}, thread = {1}", 
            action,Thread.CurrentThread.ManagedThreadId);
        //}

Upvotes: 1

Views: 3470

Answers (2)

Enigmativity
Enigmativity

Reputation: 117029

Your original code is doing a lock inside a Parallel.ForEach. That's essentially taking the parallel code and forcing it to run in series.

It takes 40 seconds on my machine.

It is really the equivalent of doing this:

    foreach (var action in collection)
    {
            var dt = collection.FirstOrDefault(x => x == action);
            if (dt > 0)
            {
                Student student = new Student();
                student.ID = dt;
                student.Name = "Zoyeb";
                student.Email = "[email protected]";

                students.Add(student);
            }
    }

Which also takes 40 seconds.

However, if you just do this:

    foreach (var action in collection)
    {
        Student student = new Student();
        student.ID = action;
        student.Name = "Zoyeb";
        student.Email = "[email protected]";

        students.Add(student);
    }

That takes 1 millisecond to run. It's roughly 40,000 times quicker.

In this case you can get much faster loops by iterating your collection once, not in a nested way and not using Parallel.ForEach.


My ap0ologies for missing that the bit about the id not existing.

Try this:

    HashSet<int> hashSet = new HashSet<int>(collection);

    List<Student> students = new List<Student>(100000);

    var sp = System.Diagnostics.Stopwatch.StartNew();
    sp.Start();
    foreach (var action in collection)
    {
        if (hashSet.Contains(action))
        {
            Student student = new Student();
            student.ID = action;
            student.Name = "Zoyeb";
            student.Email = "[email protected]";

            students.Add(student);
        }
    }
    sp.Stop();

That runs in 3 milliseconds.

An alternative is to use a join like this:

    foreach (var action in
        from c in collection
        join dt in collection on c equals dt
        select dt)
    {
        Student student = new Student();
        student.ID = action;
        student.Name = "Zoyeb";
        student.Email = "[email protected]";

        students.Add(student);
    }

That runs in 25 milliseconds.

Upvotes: 1

Preben Huybrechts
Preben Huybrechts

Reputation: 6111

Problem 1

You are using a lock instead of a concurrent collection in a parallel foreach. Forcing the parallel foreach to wait, to access the lock and thus execution will be one-by-one.

Change your list to a ConcurrentBag, and remove the lock from the ParallelForEach

// using System.Collections.Concurrent; // at the top
var students = new ConcurrentBag<Student>()

Problem 2

FirstOrDefault() Isn't very performant if you want to select by Id. Use a Dictionary. Since the dictionary does a Hashmatch its much faster then FirstOrDefault. See this questoin.

Change your collection to be a dictionary:

var collection = Enumerable.Range(1, 100000)
    .ToDictionary(x=> x);

Change the access in your loop to:

if(collection.TryGetValue(action, out var dt))
{
  //....
}

Problem 3

Stopwatch is not a benchmarking tool. Please use Benchmark.Net or a different library.

Upvotes: 0

Related Questions