Wayne Christopher
Wayne Christopher

Reputation: 623

Possible redis data corruption bug

I have seen a data problem happening in redis and I am wondering if my diagnosis is correct. Essentially when I'm doing a lot of writing to a server and reading using a Jedis client, I am seeing timeouts followed by incorrect data being returned by get() operations - the data makes sense but it's for a different key.

Here is what I think is happening:

  1. Master is put under a lot of write load
  2. Slave does a periodic bgsave
  3. Slave tries to catch up to the master but it's gotten too far behind so it does a full re-sync
  4. To serve the full re-sync, master does a bgsave of a 10GB+ data set while handling lots of reads and writes
  5. Jedis client get() call times out before the data comes back from the server
  6. The next get() call done on the same client reads the data that has been written in response to the one that timed out (since it actually arrives in the socket buffer after the timeout but before the next call)
  7. From now on, every get() call returns the data intended for the previous one

My solution, which seems to work, is to close and reopen the connection every time a timeout exception is thrown.

Does this seem like a plausible explanation for what I am seeing?

Upvotes: 2

Views: 1188

Answers (1)

The Real Bill
The Real Bill

Reputation: 15773

What you are describing would not be a Redis bug but a Jedis one as the offset reads would be happening in the client.

In this case a workaround to reconnect on timeout would be reasonable and should work. I'd also recommend submitting it as a bug to Jedis.

Upvotes: 2

Related Questions