Shashi
Shashi

Reputation: 41

Redis scan issue for multiple replica configuration

I am using AWS Elasticache redis(cluster mode off) with 1 primary and 2 replica and trying to get keys from redis for a given key pattern using redis scan command (scan (cursor, count, matchGlobe)) but it is giving an inconsistent result i.e its not giving complete set of keys (actual retrieved keys size < expected keys size).

It works perfectly fine when I am using 1 primary and 1 replica but start seeing the issue when I increase replica count greater than 1.

I have some intuition of what might be going wrong but can't affirm it. Scan basically starts with cursor value 0 and picks n(given count) matching keys and then returns result as well as next cursor value which must be used for next iteration of scan and so on until cursor value again becomes 0 which signals the end of iteration and in the process we collect all the keys. But when we call replica to scan keys, it can go to one replica in 1st iteration and may go to another replica for 2nd iteration which can give us few redundant keys and this is what we want to avoid to make it work (I don't know if this is the case though).

Few more details:

Redis engine used - 6.2.6
Shard - 1
Number of nodes used - 3 (1 primary, 2 replica)
Cluster Mode - disabled

Here is the Scala code for scanning the keys (I am using etaty library v1.9.0 for redis) -

def scan(pattern: String): Seq[String] = {
    val CHUNKSIZE = 10000  
    val buffer = ListBuffer[String]()
    var index = 0 
    do { 
        val cursor = synchronized { 
            Await.result({ 
                replicasClient.scan(index, Some(CHUNKSIZE), Some(pattern)) 
            } 
        } , 1.minute) 
        buffer.addAll(cursor.data) 
        index = cursor.index 
        } while (index > 0) 
    buffer.toSeq 
}

Looked at few documents explaining the working of scan but all of them were either for single replica case or for cluster mode enabled case, none of them were for multi-replica with cluster mode disabled case.

Highlights: During scan iteration process, redis keys collection remains fixed. It doesn't change. However, this collection keeps on updating throughout the day except a specific time window during which scanning is performed.

Upvotes: 0

Views: 742

Answers (1)

Efran Cobisi
Efran Cobisi

Reputation: 6454

As your keys may change between one SCAN iteration and the next one, there is no guarantee you will get the expected number of keys here, even within a single stand-alone Redis instance.

Quoting the official documentation:

The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process.

If you absolutely need to get a snapshot of the keys in a given Redis instance then you could use the KEYS command instead but beware of the negative performance implications (see the official documentation for the details - basically, Redis blocks until all the keys are enumerated and this is an O(N) operation depending on the number of keys).

As an alternative to the above, I would suggest to review your application logic so that it stores the number of keys you would like to monitor elsewhere, for example in a dedicated key which you can INCR / DECR and GET when needed: this way, you could completely avoid scanning your keys in the first place.


Update: given your keys do not change during the SCAN iteration, the reason why your are getting a different number of keys from your replica(s) can be that Redis uses by default asynchronous replication and your replica(s) may not yet include the whole set of keys you have in your primary, at the moment of your SCAN.

To overcome this limitation, you could execute the WAIT command to make sure your primary has synchronized with (at least) n replicas.

As an alternative which I would opt for, you could just iterate over your keys on your primary Redis instance, without querying your replica(s).

Upvotes: 2

Related Questions