Reputation: 193
I wanted to check if there is a way to generate a hash or any kind of unique identifier for all the data in a Cassandra table at a given point of time? I can't find anything to generate a hash directly but even if there's a way to create a hash of each column and the data contained in it, that could be useful. I wish to use this for maintaining a record of the table to see if it has been modified without authorisation.
Upvotes: 0
Views: 33
Reputation: 16313
That feature doesn't exist [I think] because it doesn't scale and isn't practical for the types of workloads that Cassandra databases handle.
Generating a hash of columns, rows or partitions might work if there are only a few hundred records, maybe a few thousand. But if there are millions or billions of records in each table, generating that hash is not only CPU-intensive but impractical -- to setup detection requires performing a full table scan periodically which works on very small clusters but imagine what it would be like for clusters with 100 or 1,000 nodes.
Ultimately, a record which has been modified without authorisation is a red flag. It is clearly a breach that has penetrated multiple layers of security. If someone is able to do that, you have a bigger problem to deal with.
You should be using service accounts that can only be used by certain applications. It is possible to configure role-based access controls (RBAC) so permission is only granted to resources (tables, keyspaces, functions) that a role has authorised to access.
You should also implement separation-of-duties such that for example, admins don't have direct access to application data, just parts of the DB related to systems management and operations.
There are other security features in Cassandra which provide the necessary artefacts to support security audit processes. Audit logging captures authentication/authorization events and CQL requests. Full query logging (FQL) also captures CQL requests which can be replayed for debugging or testing. The difference is that audit logging captures both successful and failed attempts, not just the successful ones.
Commercial distributions of Apache Cassandra like DataStax Enterprise (DSE), Hyper-Converged Database (HCD) and Astra DB (C*-as-a-service) provide an even more advanced security features like:
For full disclosure, I'm an Apache Cassandra committer and I work at DataStax. Cheers!
Upvotes: 1