adveach
adveach

Reputation: 75

Best way to update a value in a List variable being passed to a multi-threaded method

As if its not obvious from my question, first time trying to work with multi-threading and need some help. I have a method that is now being called by multiple threads concurrently and it is being passed a variable (defined as shown below). The part I am having problems with is the updating of the "counter" (numProcessed) part of the custom List var, which is supposed to keep track of the total number of rows processed across all threads. I assume, this should likely be a class instead of this overcomplicated List structure, but trying to work within the current confines.

DoWorkMethod(List<(String dataSetCode, int dataSetNum, int rowCount, double modVal,
    int numProcessed, List<Task> queryTasks)> lstDataSetCdTasks)
//CS1612 Error Workaround
var temp = lstDataSetCdTasks[dataSetNum];
//Interlocked.Exchange(
//    ref temp.numProcessed,lstDataSetCdTasks[dataSetNum].numProcessed + 1);
temp.numProcessed = lstDataSetCdTasks[dataSetNum].numProcessed + 1;
lstDataSetCdTasks[dataSetNum] = temp;

currRow.Cells["Extract"].Value = lstDataSetCdTasks[dataSetNum].numProcessed / 
lstDataSetCdTasks[dataSetNum].modVal + "| % (" + 
lstDataSetCdTasks.ElementAt(dataSetNum).queryTasks.Count + " threads)";

What is the best way to update this counter in a thread safe manner? I thought I needed to use Interlocked.Exchange() given I cant update lstDataSetCdTasks directly and I have to assign that first to some temp variable then back in order to work around the CS1612 error. However, it still seems to fail to update very sporadically. Thanks in advance.

Upvotes: 1

Views: 916

Answers (2)

Theodor Zoulias
Theodor Zoulias

Reputation: 43474

The List<T> class is thread-safe only for multiple readers. As long as you add a writer in the mix, you must synchronize all interactions with the collection, otherwise its behavior is undefined. The simplest synchronization tool is the lock statement. This statement requires a locker object, which can be any reference type, and usually is either a dedicated new object() or the list itself. The locker should not "leak" to the outside world, so using the list itself as the locker is only viable if the list is internal, and you don't expose it to unknown code.

For demonstration purposes I'll show an example with a simpler list than your lstDataSetCdTasks, a list that contains value tuples with just two members. I am showing three approaches of incrementing the NumProcessed member . Take a look at them, and I'll explain them below:

List<(string DataSetCode, int NumProcessed)> list
    = new() { ("A", 0), ("B", 0), ("C", 0) };
int index = 1;
Console.WriteLine($"Before: {String.Join(", ", list)}");

lock (list)
{
    (string DataSetCode, int NumProcessed) temp = list[index];
    temp.NumProcessed++;
    list[index] = temp;
}
Console.WriteLine($"After1: {String.Join(", ", list)}");

lock (list)
{
    Span<(string DataSetCode, int NumProcessed)> span = CollectionsMarshal
        .AsSpan(list);
    span[index].NumProcessed++;
}
Console.WriteLine($"After2: {String.Join(", ", list)}");

// Incorrect code, for educational purposes only
{
    Span<(string DataSetCode, int NumProcessed)> span = CollectionsMarshal
        .AsSpan(list);
    Interlocked.Increment(ref span[index].NumProcessed);
}
Console.WriteLine($"After3: {String.Join(", ", list)}");

Output:

Before: (A, 0), (B, 0), (C, 0)
After1: (A, 0), (B, 1), (C, 0)
After2: (A, 0), (B, 2), (C, 0)
After3: (A, 0), (B, 3), (C, 0)

Online demo.

The first approach locks on the list, creating a protected region that only one thread can enter at a time¹. Inside the protected region we store a copy of a tuple in a temp variable, we mutate the copy, and then we replace the existing tuple in the list with the mutated copy. This is the simplest way to mutate a value-type stored in a List<T>.

The second approach again locks on the list, and then uses the advanced CollectionsMarshal.AsSpan to get a Span<T> representation of the list. With the Span<T> you gain direct access to the backing array of the list, and so you can mutate the stored value-tuples in-place, without using temporary variables. This is the most efficient way of mutating value-types stored in a List<T>.

The third approach doesn't use the lock statement, and instead attempts to grab the Span<T> and then mutate an entry with the Interlocked.Increment method. This is valid C# code, but it is not thread-safe and it has undefined behavior. The problem is that another thread might perform concurrently an action that will replace the backing array of the list, in which case the mutation performed by the current thread will be lost. This would be a valid approach though if instead of a list you stored your tuples in an array ((string DataSetCode, int NumProcessed)[]). The arrays have fixed length, so they are less versatile than lists. But they open some opportunities for lock-free multithreading, opportunities that lists totally lack. I am not advising you to pursue these opportunities though. As a beginner in multithreading, it's much safer to stick with the lock. As long as you don't do anything heavy inside the protected regions, the locks are cheap and won't slow down your application.

¹ Provided that all other interactions with the list are performed in lock regions protected with the same locker. Caution: even a single unprotected interaction with the list, even reading the Count property, renders your program invalid and it's behavior undefined. Enumerating the list should also be enclosed in a protected region. Interacting with the List<T>.Enumerator counts as interacting with the list itself.

Upvotes: 2

myteron
myteron

Reputation: 1

I suggest you use the concurrent.futures module as it provides a common interface for Multithreading and Multiprocessing in Python.
concurrent.futures

ProcessPoolExecutor == CPU heavy tasks such as calculating pi
ThreadPoolExecutor == i/o heavy, wait states such as web-scraping

Provided map function makes it super easy to pass an array in and out. It does however lack fine granularity and control over exceptions and timeouts. Don't use 'map' if you need control over timeouts or exception handling.

ThreadPoolExecutor in Python: The Complete Guide has a very good explanation for mostly everything on that subject.

Be warned: only use multi-x if you absolutely need it as it is:

  • More overhead in processing
  • More complex in programming
  • Slower to troubleshoot
  • Potentially slower in processing

Upvotes: -1

Related Questions