Cassandra nodejs driver time out after a node moves

Question

We use vnodes on our cluster.

I noticed that when the token space of a node changes (automatically on vnodes, during a repair or a cleanup after adding new nodes), the datastax nodejs driver gets a lot of "Operation timed out - received only X responses" for a few minutes.

I tried using ONE and LOCAL_QUORUM consistencies.

I suppose this is due to the coordinator not hitting the right node just after the move. This seems to be a logical behavior (data was moved) but we really want to address this particular issue.

What do you guys suggest we should do to avoid this ? Having a custom retry policy ? Caching ? Changing the consistency ?

Example of behavior

when we see this:

4/7/2016, 10:43am   Info    Host 172.31.34.155 moved from '8185241953623605265' to '-1108852503760494577'

We see a spike of those:

{
  "message":"Operation timed out - received only 0 responses.",
  "info":"Represents an error message from the server",
  "code":4608,
  "consistencies":1,
  "received":0,
  "blockFor":1,
  "isDataPresent":0,
  "coordinator":"172.31.34.155:9042",
  "query":"SELECT foo FROM foo_bar LIMIT 10"
}

doanduyhai · Accepted Answer

I suppose this is due to the coordinator not hitting the right node just after the move. This seems to be a logical behavior (data was moved) but we really want to address this particular issue.

In fact, when adding new node, there will be token range movement but Cassandra can still serve read requests using the old token ranges until the scale out has finished completely. So the behavior you're facing is very suspicious.

If you can reproduce this error, please activate query tracing to narrow down the issue.

The error can also be related to a node under heavy load and not replying fast enough

Cassandra nodejs driver time out after a node moves

Example of behavior

Answers (1)

Related Questions