CRCerr0r
CRCerr0r

Reputation: 535

Authentication failures in cassandra when 1 of 16 nodes is down

I have a Cassandra cluster running :

Cassandra 2.0.11.83 | DSE 4.6.0 | CQL spec 3.1.1 | Thrift protocol 19.39.0

The cluster has 18 nodes, split among 3 datacenters, 6 in each. My system_auth keyspace has the following replication defined:

replication = { 'class': 'NetworkTopologyStrategy', 'DC1': '4', 'DC2': '4', 'DC3': '4'}

and my authenticator/authorizer are set to:

authenticator: org.apache.cassandra.auth.PasswordAuthenticator

authorizer: org.apache.cassandra.auth.CassandraAuthorizer

This morning I brought down one of the nodes in DC1 for maintenance. Within a few seconds/minute client applications started logging exceptions like this:

"User my_application_user has no MODIFY permission on or any of its parents"

Running 'LIST ALL PERMISSIONS of my_application_user' on one of the other nodes shows that user to have SELECT and MODIFY on the keyspace xxxxx, so I am rather confused. Do I have a setup issue? Is this a bug of some sort?

Upvotes: 3

Views: 1361

Answers (1)

CRCerr0r
CRCerr0r

Reputation: 535

Re-posting this as the answer, as BrianC suggested above.

So this is resolved... Here's the sequence of events that seems to have fixed it:

  1. Add 18 more nodes
  2. Run cleanup on original nodes (this was part of the original plan)
  3. Run a scrub on 1 table, since it was throwing exceptions on cleanup
  4. Run a repair on the system_auth KS on the original troubled node
  5. Wait for repair service to complete a full pass on all keyspaces
  6. Decom original 18 nodes.

Honestly, I don't know what fixed it. The system_auth repair makes most sense, but what doesn't make sense is that it had run many passes before, so why work now, I don't know. I hope this at least helps someone.

Upvotes: 1

Related Questions