Reputation: 8367
What is the correct solution to be sure that the file will be never corrupted while using many threads and processes?
version for threads, which care about opening errors.
lock = threading.RLock()
with lock:
try:
f = open(file, 'a')
try:
f.write('sth')
finally:
f.close() # try close in any circumstances if open passed
except:
pass # when open failed
for processes I guess must use multiprocessing.Lock
but if I want 2 processes, and the first process owns 2 threads (each one uses the file)
I want to know how to mix synchronization with threads and processes. do threads "inherit" it from the process? so only synchronization between processes is required?
Also. I'm not sure if the above code need nested try
in a case when the write operation will fail, and we want to close opened file (what if it will remain open after the lock is released)
Upvotes: 18
Views: 14752
Reputation: 365577
While this isn't entirely clear from the docs, multiprocessing synchronization primitives do in fact synchronize threads as well.
For example, if you run this code:
import multiprocessing
import sys
import threading
import time
lock = multiprocessing.Lock()
def f(i):
with lock:
for _ in range(10):
sys.stderr.write(i)
time.sleep(1)
t1 = threading.Thread(target=f, args=['1'])
t2 = threading.Thread(target=f, args=['2'])
t1.start()
t2.start()
t1.join()
t2.join()
… the output will always be 1111111111222222222
or 22222222221111111111
, not a mixture of the two.
The locks are implemented on top of Win32 kernel sync objects on Windows, semaphores on POSIX platforms that support them, and not implemented at all on other platforms. (You can test this with import multiprocessing.semaphore
, which will raise an ImportError
on other platforms, as explained in the docs.)
That being said, it's certainly safe to have two levels of locks, as long as you always use them in the right order—that is, never grab the threading.Lock
unless you can guarantee that your process has the multiprocessing.Lock
.
If you do this cleverly enough, it can have performance benefits. (Cross-process locks on Windows, and on some POSIX platforms, can be orders of magnitude slower than intra-process locks.)
If you just do it in the obvious way (only do with threadlock:
inside with processlock:
blocks), it obviously won't help performance, and in fact will slow things down a bit (although quite possibly not enough to measure), and it won't add any direct benefits. Of course your readers will know that your code is correct even if they don't know that multiprocessing
locks work between threads, and in some cases debugging intraprocess deadlocks can be a lot easier than debugging interprocess deadlocks… but I don't think either of those is a good enough reason for the extra complexity in most cases.
Upvotes: 13