Bash scripting: reader writer lock

Question

Imagine a network of several nix machines. A dedicated node stores files and periodically schedules Task A that modifies these files. Each of the other nodes schedules Task B that syncs (rsync) those files to local storage.

Task A can take considerable amount of time and the file collection needs to be in a consistent state on all nodes. Thus Task B shouldn't run while Task A is running.

A possible solution for this is to use a reader-writer lock. Task A and Task B would put a write and a read lock on the resource respectively.

I wonder how can we implement such locking mechanism with unix shell scripting.

rici · Accepted Answer

The usual way of doing this is with the flock utility, which is part of the util-linux package. FreeBSD and NetBSD packages are also available, aiui, and probably others. (For MacOSX, see this question.)

The flock command can do both read ("shared") locks and write ("exclusive") locks. It is based on the flock(2) system call, and is consequently co-operative locking (aka advisory locking), but in most applications that will work fine (but see below for the case where the file is remote).

There are usage examples in the linked man page above. The simplest usage case is

flock /tmp/lockfile /usr/local/bin/do_the_update
flock /tmp/lockfile -s /usr/local/bin/do_the_rsync

both of obtain a lock on /tmp/lockfile, and then execute the specified command (presumably a shell script). The first command obtains an exclusive lock; I could have made that explicit with the -x option. The second command obtains a shared lock.

Since the question actually involves the need for a network lock, it is necessary to point out that flock() may not be reliable on a networked filesystem. Normally, the target file should always be local.

Even in a non-distributed application, you need to consider the possibilities of failure. Suppose you were rsync'ing locally to create a copy, for example. If the host crashes while the rsync is in process, you will end up with an incomplete or corrupt copy. rsync can recover from that, but there is no certainty that when the host restarts, the rsync will initiate before the files are modified. That shouldn't be a problem, but you definitely need to take it into account.

In a distributed application, the situation is more complex because the entire system rarely fails. You can have independent failure of the different servers or of the network itself.

Advisory locking is not persistent. If the lockfile's host crashes with the lock held and restarts, the lock will not be held after the restart. On the other hand, if one of the remote servers which holds the lock crashes and restarts, it may not be aware that it is holding the lock, in which case the lock will never be released.

If both servers were 100% aware of each other's state, this wouldn't be a problem, but it is very difficult to distinguish network failure from host failure.

You will need to evaluate the risks. As with the local case, if the fileserver crashes while an rsync is in progress, it may restart and immediately start modifying the files. If the remote rsync's did not fail while the fileserver was down, they will continue to attempt to synchronize and the resulting copy will be corrupt. With rsync, this should resolve itself on the next sync cycle, but in the interim you have a problem. You will need to decide how serious this is.

You can prevent the fileserver from starting the mutator on startup by using persistent locks. Each rsync server creates its own lockfile on the host before starting the rsync (and does not start the rsync until it is known that the file exists) and deletes the file before releasing the read lock. If an rsync server restarts and its indicator file exists, it knows that there was a crash during the rysnc, so it must delete the indicator file and restart the rsync.

This will work fine most of the time, but it can fail if an rsync server crashes during the rsync and never restarts, or restarts only after a long time. (Or, equivalently, if network failure isolates the rsync server for a long time.) In these cases, it is likely that manual intervention will be necessary. It would be useful to have a watchdog process running on the fileserver which alerts an operator if the read lock has been held for too long, for some definition of "too long".

Bash scripting: reader writer lock

Answers (1)

Related Questions