Reputation: 26688
I recently switched an MVC application that serves data feeds and dynamically generated images (6k rpm throughput) from the v3.9.67 ServiceStack.Redis client to the latest StackExchange.Redis client (v1.0.450) and I'm seeing some slower performance and some new exceptions.
Our Redis instance is S4 level (13GB), CPU shows a fairly constant 45% or so and network bandwidth appears fairly low. I'm not entirely sure how to interpret the gets/sets graph in our Azure portal, but it shows us around 1M gets and 100k sets (appears that this may be in 5 minute increments).
The client library switch was straightforward and we are still using the v3.9 ServiceStack JSON serializer so that the client lib was the only piece changing.
Our external monitoring with New Relic shows clearly that our average response time increases from about 200ms to about 280ms between ServiceStack and StackExchange libraries (StackExchange being slower) with no other change.
We recorded a number of exceptions with messages along the lines of:
Timeout performing GET feed-channels:ag177kxj_egeo-_nek0cew, inst: 12, mgr: Inactive, queue: 30, qu=0, qs=30, qc=0, wr=0/0, in=0/0
I understand this to mean that there are a number of commands in the queue that have been sent but no response available from Redis, and that this can be caused by long running commands that exceed the timeout. These errors appeared for a period when our sql database behind one of our data services was getting backed up, so perhaps that was the cause? After scaling out that database to reduce load we haven't seen very many more of this error, but the DB query should be happening in .Net and I don't see how that would hold up a redis command or connection.
We also recorded a large batch of errors this morning over a short period (couple of minutes) with messages like:
No connection is available to service this operation: SETEX feed-channels:vleggqikrugmxeprwhwc2a:last-retry
We were used to transient connection errors with the ServiceStack library, and those exception messages were usually like this:
Unable to Connect: sPort: 63980
I'm under the impression that SE.Redis should be retrying connections and commands in the background for me. Do I still need to be wrapping our calls through SE.Redis in a retry policy of my own? Perhaps different timeout values would be more appropriate (though I'm not sure what values to use)?
Our redis connection string sets these parameters: abortConnect=false,syncTimeout=2000,ssl=true
. We use a singleton instance of ConnectionMultiplexer
and transient instances of IDatabase
.
The vast majority of our Redis use goes through a Cache class, and the important bits of the implementation are below, in case we're doing something silly that's causing us problems.
Our keys are generally 10-30 or so character strings. Values are largely scalar or reasonably small serialized object sets (hundred bytes to a few kB generally), though we do also store jpg images in the cache so a large chunk of the data is from a couple hundred kB to a couple MB.
Perhaps I should be using different multiplexers for small and large values, probably with longer timeouts for larger values? Or couple/few multiplexers in case one is stalled?
public class Cache : ICache
{
private readonly IDatabase _redis;
public Cache(IDatabase redis)
{
_redis = redis;
}
// storing this placeholder value allows us to distinguish between a stored null and a non-existent key
// while only making a single call to redis. see Exists method.
static readonly string NULL_PLACEHOLDER = "$NULL_VALUE$";
// this is a dictionary of https://github.com/StephenCleary/AsyncEx/wiki/AsyncLock
private static readonly ILockCache _locks = new LockCache();
public T GetOrSet<T>(string key, TimeSpan cacheDuration, Func<T> refresh) {
T val;
if (!Exists(key, out val)) {
using (_locks[key].Lock()) {
if (!Exists(key, out val)) {
val = refresh();
Set(key, val, cacheDuration);
}
}
}
return val;
}
private bool Exists<T>(string key, out T value) {
value = default(T);
var redisValue = _redis.StringGet(key);
if (redisValue.IsNull)
return false;
if (redisValue == NULL_PLACEHOLDER)
return true;
value = typeof(T) == typeof(byte[])
? (T)(object)(byte[])redisValue
: JsonSerializer.DeserializeFromString<T>(redisValue);
return true;
}
public void Set<T>(string key, T value, TimeSpan cacheDuration)
{
if (value.IsDefaultForType())
_redis.StringSet(key, NULL_PLACEHOLDER, cacheDuration);
else if (typeof (T) == typeof (byte[]))
_redis.StringSet(key, (byte[])(object)value, cacheDuration);
else
_redis.StringSet(key, JsonSerializer.SerializeToString(value), cacheDuration);
}
public async Task<T> GetOrSetAsync<T>(string key, Func<T, TimeSpan> getSoftExpire, TimeSpan additionalHardExpire, TimeSpan retryInterval, Func<Task<T>> refreshAsync) {
var softExpireKey = key + ":soft-expire";
var lastRetryKey = key + ":last-retry";
T val;
if (ShouldReturnNow(key, softExpireKey, lastRetryKey, retryInterval, out val))
return val;
using (await _locks[key].LockAsync()) {
if (ShouldReturnNow(key, softExpireKey, lastRetryKey, retryInterval, out val))
return val;
Set(lastRetryKey, DateTime.UtcNow, additionalHardExpire);
try {
var newVal = await refreshAsync();
var softExpire = getSoftExpire(newVal);
var hardExpire = softExpire + additionalHardExpire;
if (softExpire > TimeSpan.Zero) {
Set(key, newVal, hardExpire);
Set(softExpireKey, DateTime.UtcNow + softExpire, hardExpire);
}
val = newVal;
}
catch (Exception ex) {
if (val == null)
throw;
}
}
return val;
}
private bool ShouldReturnNow<T>(string valKey, string softExpireKey, string lastRetryKey, TimeSpan retryInterval, out T val) {
if (!Exists(valKey, out val))
return false;
var softExpireDate = Get<DateTime?>(softExpireKey);
if (softExpireDate == null)
return true;
// value is in the cache and not yet soft-expired
if (softExpireDate.Value >= DateTime.UtcNow)
return true;
var lastRetryDate = Get<DateTime?>(lastRetryKey);
// value is in the cache, it has soft-expired, but it's too soon to try again
if (lastRetryDate != null && DateTime.UtcNow - lastRetryDate.Value < retryInterval) {
return true;
}
return false;
}
}
Upvotes: 3
Views: 2234
Reputation: 4154
A few recommendations. - You can use different multiplexers with different timeout values for different types of keys/values http://azure.microsoft.com/en-us/documentation/articles/cache-faq/ - Make sure you are not network bound on the client and server. if you are on the server then move to a higher SKU which has more bandwidth Please read this post for more details http://azure.microsoft.com/blog/2015/02/10/investigating-timeout-exceptions-in-stackexchange-redis-for-azure-redis-cache/
Upvotes: 3