Dusty
Dusty

Reputation: 3971

Mutex violations using ServiceStack Redis for distributed locking

I'm attempting to implement DLM using the locking mechanisms provided by the ServiceStack-Redis library and described here, but I'm finding that the API seems to present a race condition which will sometimes grant the same lock to multiple clients.

BasicRedisClientManager mgr = new BasicRedisClientManager(redisConnStr);

using(var client = mgr.GetClient())
{
    client.Remove("touchcount");
    client.Increment("touchcount", 0);
}

Random rng = new Random();

Action<object> simulatedDistributedClientCode = (clientId) => {

    using(var redisClient = mgr.GetClient())
    {
        using(var mylock = redisClient.AcquireLock("mutex", TimeSpan.FromSeconds(2)))
        {
            long touches = redisClient.Get<long>("touchcount");
            Debug.WriteLine("client{0}: I acquired the lock! (touched: {1}x)", clientId, touches);
            if(touches > 0) {
                Debug.WriteLine("client{0}: Oh, but I see you've already been here. I'll release it.", clientId);
                return;
            }
            int arbitraryDurationOfExecutingCode = rng.Next(100, 2500);
            Thread.Sleep(arbitraryDurationOfExecutingCode); // do some work of arbitrary duration
            redisClient.Increment("touchcount", 1);
        }
        Debug.WriteLine("client{0}: Okay, I released my lock, your turn now.", clientId);
    }
};
Action<Task> exceptionWriter = (t) => {if(t.IsFaulted) Debug.WriteLine(t.Exception.InnerExceptions.First());};

int arbitraryDelayBetweenClients = rng.Next(5, 500);
var clientWorker1 = new Task(simulatedDistributedClientCode, 1);
var clientWorker2 = new Task(simulatedDistributedClientCode, 2);

clientWorker1.Start();
Thread.Sleep(arbitraryDelayBetweenClients);
clientWorker2.Start();

Task.WaitAll(
    clientWorker1.ContinueWith(exceptionWriter),
    clientWorker2.ContinueWith(exceptionWriter)
    );

using(var client = mgr.GetClient())
{
    var finaltouch = client.Get<long>("touchcount");
    Console.WriteLine("Touched a total of {0}x.", finaltouch);
}

mgr.Dispose();

When running the above code to simulate two clients attempting the same operation within short succession of one another, there are three possible outputs. The first one is the optimal case where the Mutex works properly and the clients proceed in the proper order. The second case is when the 2nd client times out waiting to acquire a lock; also an acceptable outcome. The problem, however, is that as arbitraryDurationOfExecutingCode approaches or exceeds the timeout for acquiring a lock, it is quite easy to reproduce a situation where the 2nd client is granted the lock BEFORE the 1st client releases it, producing output like this:

client1: I acquired the lock! (touched: 0x)
client2: I acquired the lock! (touched: 0x)
client1: Okay, I released my lock, your turn now.
client2: Okay, I released my lock, your turn now.
Touched a total of 2x.

My understanding of the API and its documentation is that the timeOut argument when acquiring a lock is meant to be just that -- the timeout for getting the lock. If I have to guess at a timeOut value that is high enough to always be longer than the duration of my executing code just to prevent this condition, that seems pretty error prone. Does anyone have a work around other than passing null to wait on locks forever? I definitely don't want to do that or I know I'll end up with ghost locks from crashed workers.

Upvotes: 3

Views: 1393

Answers (2)

Dusty
Dusty

Reputation: 3971

The answer from mythz (thanks for the prompt response!) confirms that the built-in AcquireLock method in ServiceStack.Redis doesn't draw a distinction between the lock acquisition period versus the lock expiration period. For our purposes, we have existing code that expected the distributed locking mechanism to fail quickly if the lock was taken, but allow for long-running processes within the lock scope. To accommodate these requirements, I derived this variation on the ServiceStack RedisLock that allows a distinction between the two.

// based on ServiceStack.Redis.RedisLock
// https://github.com/ServiceStack/ServiceStack.Redis/blob/master/src/ServiceStack.Redis/RedisLock.cs
internal class RedisDlmLock : IDisposable
{
    public static readonly TimeSpan DefaultLockAcquisitionTimeout = TimeSpan.FromSeconds(30);
    public static readonly TimeSpan DefaultLockMaxAge = TimeSpan.FromHours(2);
    public const string LockPrefix = "";    // namespace lock keys if desired

    private readonly IRedisClient _client; // note that the held reference to client means lock scope should always be within client scope

    private readonly string _lockKey;
    private string _lockValue;

    /// <summary>
    /// Acquires a distributed lock on the specified key.
    /// </summary>
    /// <param name="redisClient">The client to use to acquire the lock.</param>
    /// <param name="key">The key to acquire the lock on.</param>
    /// <param name="acquisitionTimeOut">The amount of time to wait while trying to acquire the lock. Defaults to <see cref="DefaultLockAcquisitionTimeout"/>.</param>
    /// <param name="lockMaxAge">After this amount of time expires, the lock will be invalidated and other clients will be allowed to establish a new lock on the same key. Deafults to <see cref="DefaultLockMaxAge"/>.</param>
    public RedisDlmLock(IRedisClient redisClient, string key, TimeSpan? acquisitionTimeOut = null, TimeSpan? lockMaxAge = null)
    {
        _client = redisClient;
        _lockKey = LockPrefix + key;

        ExecExtensions.RetryUntilTrue(
            () =>
            {
                //Modified from ServiceStack.Redis.RedisLock
                //This pattern is taken from the redis command for SETNX http://redis.io/commands/setnx
                //Calculate a unix time for when the lock should expire

                lockMaxAge = lockMaxAge ?? DefaultLockMaxAge; // hold the lock for the default amount of time if not specified.
                DateTime expireTime = DateTime.UtcNow.Add(lockMaxAge.Value);
                _lockValue = (expireTime.ToUnixTimeMs() + 1).ToString(CultureInfo.InvariantCulture);

                //Try to set the lock, if it does not exist this will succeed and the lock is obtained
                var nx = redisClient.SetEntryIfNotExists(_lockKey, _lockValue);
                if (nx)
                    return true;

                //If we've gotten here then a key for the lock is present. This could be because the lock is
                //correctly acquired or it could be because a client that had acquired the lock crashed (or didn't release it properly).
                //Therefore we need to get the value of the lock to see when it should expire
                string existingLockValue = redisClient.Get<string>(_lockKey);
                long lockExpireTime;
                if (!long.TryParse(existingLockValue, out lockExpireTime))
                    return false;
                //If the expire time is greater than the current time then we can't let the lock go yet
                if (lockExpireTime > DateTime.UtcNow.ToUnixTimeMs())
                    return false;

                //If the expire time is less than the current time then it wasn't released properly and we can attempt to 
                //acquire the lock. This is done by setting the lock to our timeout string AND checking to make sure
                //that what is returned is the old timeout string in order to account for a possible race condition.
                return redisClient.GetAndSetEntry(_lockKey, _lockValue) == existingLockValue;
            },
            acquisitionTimeOut ?? DefaultLockAcquisitionTimeout // loop attempting to get the lock for this amount of time.
            );
    }

    public override string ToString()
    {
        return String.Format("RedisDlmLock:{0}:{1}", _lockKey, _lockValue);
    }

    public void Dispose()
    {
        try
        {
            // only remove the entry if it still contains OUR value
            _client.Watch(_lockKey);
            var currentValue = _client.Get<string>(_lockKey);
            if (currentValue != _lockValue)
            {
                _client.UnWatch();
                return;
            }

            using (var tx = _client.CreateTransaction())
            {
                tx.QueueCommand(r => r.Remove(_lockKey));
                tx.Commit();
            }
        }
        catch (Exception ex)
        {
            // log but don't throw
        }
    }
}

To simplify use as much as possible, I also expose some extension methods for IRedisClient to parallel the AcquireLock method, along these lines:

internal static class RedisClientLockExtensions
{
    public static IDisposable AcquireDlmLock(this IRedisClient client, string key, TimeSpan timeOut, TimeSpan maxAge)
    {
        return new RedisDlmLock(client, key, timeOut, maxAge);
    }
}

Upvotes: 3

mythz
mythz

Reputation: 143319

Your question highlights the behavior of Distributed Locking in ServiceStack.Redis, if the timeout specified is exceeded, the timed-out clients treats it as an invalid lock and will attempt to auto-recover the lock. If there was no auto-recovery behavior a crashed client would never release the lock and no further operations waiting on that lock would be allowed through.

The locking behavior for AcquireLock is encapsulated in the RedisLock class:

public IDisposable AcquireLock(string key, TimeSpan timeOut)
{
    return new RedisLock(this, key, timeOut);
}

Which you can take a copy of and modify to suit the behavior you'd prefer:

using (new MyRedisLock(client, key, timeout))
{
    //...
}

Upvotes: 2

Related Questions