Reputation: 588
I am working on a problem which requires me to independently change the value of a dictionary in python.
The code is like below
g = 1 # a variable
for j in lst1:
dictionary = fo_1(j) # a function to update the dictinary
for key in dictionary: # need to proccess in parallel
dictionary[key] = fo_2(dictionary[key], key, j, g)
As you can see, each time when the fo_2
is called, we only access one value by key
and modify it in dictionary
.
Upvotes: 0
Views: 868
Reputation: 661
Your question isn't entirely clear to me.
First, if you're using CPython then due to the GIL this is technically safe. In other interpreters, your results may vary.
Second, in an interpreter without a GIL, assuming a threadsafe dictionary implementation, this is still very error prone. If fo_1(j) is doing something to the key j
in the dictionary while you are also iterating over all keys in the dictionary, then you might or might not make it to key j
before it has been modified. In other words, in the general case what you propose is going to result in non-deterministic behavior, which is A Bad Thing.
If your specific use case has constraints that can somehow guarantee that this won't result in a race condition, and if you are using an interpreter without a GIL (ie other than CPython), and if that interpreter provides a threadsafe implementation of dictionary, only then you might reasonably ask how to go about writing such multithreaded code.
I apologize in advance if what I wrote below seems overly blunt, but I honestly don't think you are ready to be pursuing concurrent programming just yet.
Your particular task in this case does not appear well suited to concurrent code, because all stages of your code try to modify the same collection of data. Concurrent execution is much better suited to tasks where you have a single read-only data structure, and want to accomplish multiple independent tasks based on that data. I should also note that you generally don't want to introduce the added complexity of concurrent code unless achieving the desired level of performance requires it.
I am using Cpython.
It seems you have a fundamental misunderstanding - multithreaded code in CPython does not execute concurrently. Instead, the threads take turns executing their (separate) instruction streams. From the docs, "Therefore, the rule exists that only the thread that has acquired the GIL may operate on Python objects or call Python/C API functions." Also see this SO answer for an explanation.
To achieve actual concurrent execution of native Python code in CPython requires multiple Python processes (ie multiprocess), not just multiple threads. How to do that is beyond the scope of this answer at this point.
I think it is still doable since the inner loop starts after we finish the outer loop.
This is incorrect, given the code you provided in your question. In your provided code, the inner loop will execute fully (ie from start to end) once during each iteration of the outer loop. First fo_1(j)
is called, and completes. Then the inner loop begins, calling fo_2(...)
for each key, and completes. Then the outer loop advances to the next item, and this pattern repeats.
The fo_1 might delete the old key or create a new key to the dictionary. The fo_2 only modifies the value, and does nothing to key. I think it is still doable since the inner loop starts after we finish the outer loop. I only need to finish the inner loop in parallel.
There's... a number of problems with your apparent understanding here. Most obviously, if you were to achieve actual concurrent execution, what do you expect to happen when:
fo_2(...)
tries to modify the value of a key that's (in the mean time) been deleted by fo_1(j)
?fo_2(...)
attempt to modify the same value?fo_2(...)
reads a value to make a decision, while that same value is simultaneously being changed by a different invocation of fo_2(...)
?To be blunt, you do not seem to fully understand the many implications of parallel code, in particular that execution order is undefined. This has huge consequences.
The fo_1 might delete the old key or create a new key to the dictionary. The fo_2 only modifies the value, and does nothing to key.
This statement indicates a lack of understanding of this data structure. A Python dictionary is an associative array, a typical implementation of which is the hash table. These consist of a number of key -> value
mappings. So to restate your first sentence, the call fo_1(j)
might either add or delete a key -> value
mapping from the table. Now, if you try to modify the value of a key -> value
mapping that has been (concurrently) deleted, what happens? This will depend on the semantics of the code you wrote, but typically a key that doesn't already exist will be inserted. In other words, fo_2(...)
might well end up adding deleted keys back into the dictionary, depending on how exactly you implement it.
Upvotes: 1