haarigertroll
haarigertroll

Reputation: 117

Best way to communicate resource lock between processes

I have two python programs that are supposed to run in parallel and do the same thing:

  1. Read and unzip data from disk (takes about 1 min)
  2. Process data (takes about 2-3 min)
  3. Send data to database (takes about 3-5 min)

As you can see, it would be nice to have the execution of both instances synchronized in a way that one does the processor-heavy steps 1 and 2 (the implementation is multithreaded, so the CPU can actually be maxed out) while the other does the I/O-heavy step 3 and vice versa.

My first idea was to use a lockfile, which is acquired by each process upon entering phase 3 and released after completing it. So the other process will wait until the lock is released and then set it when it enters phase 3. However, this seems like a very cumbersome way to do it. Also, the system is supposed to run unsupervised for days and weeks with the ability to recover from errors, scheduled reboots or power failures. Especially in the last case, the lockfile could simply lock up everything.

Is there a more elegant way to communicate the lockout between the two processes? Or should I rather use the lockfile and try to implement some smart cleanup functionality to keep a deadlock from happening?

Upvotes: 5

Views: 1945

Answers (2)

Gerd
Gerd

Reputation: 2803

It seems that every solution has some drawbacks - either some mechanism or module is not available on all platforms (i.e. Linux only or Windows only), or you may run into error recovery issues with a file-system based approach (as you have already pointed out in your question).

Here is a list of some possible options:

Use Python's multiprocessing module

This allows you to create a lock like this:

lock = multiprocessing.Lock()

and to acquire and release it like this:

lock.acquire() 
# do something
lock.release() 

Here is a complete example.

Pro: Straightforward to use; cross-platform; no issues with error recovery.

Con: Since you currently have two separate programs, you will have to rearrange your code to start two processes from the same python module.

Use fnctl (Linux)

For Linux/Unix systems, there is fcntl (with fcntl.flock()) available as a python module. This is based on lockfiles.

See also this discussion with some recommendations that I am repeating here:

  • Write the process ID of the locked process to the file for being able to recognize and fix possible deadlocks.
  • Put your lock files in a temporary location or a RAM file system.

Con: Not cross-platform, available on Linux/Unix systems only.

Use posix_ipc (Linux)

For Linux/Unix systems, there is python_ipc (with a Semaphore class) available as a python module.

Pro: Not file-system based, no issues with error recovery.

Con: Not cross-platform, available on Linux/Unix systems only.

Use msvcrt (Windows)

For Windows systems, there is msvcrt (with msvcrt.locking()) available as a python module.

See also this discussion.

Con: Not cross-platform, available on Windows systems only.

Use a third-party library

You might want to check out the following python libraries:

Upvotes: 6

Santiago P
Santiago P

Reputation: 101

If you are running with some synchronization problems, in my opinion there is no better way than using semaphores. The way you handle the clean up and the lock parts depends a lot of your problem. There are a lot of resources for this kind of issues. Python has already implemented some primitives

You can check this post for an example.

Also check Zookeeper, I never use it on python but its widely used in others languages.

Upvotes: 0

Related Questions