Reputation: 1121
I am using the DataStax Nodejs driver from Cassandra and what I want to do is to avoid very frequent I/O operations that will happen for inserts in my application. I will be doing around 1000 inserts per second and want to group all together and perform 1 I/O instead of running individual queries which will cause 1000 I/Os. I came across batch statements like below,
const query1 = 'UPDATE user_profiles SET email = ? WHERE key = ?';
const query2 = 'INSERT INTO user_track (key, text, date) VALUES (?, ?, ?)';
const queries = [
{ query: query1, params: [emailAddress, 'hendrix'] },
{ query: query2, params: ['hendrix', 'Changed email', new Date()] }
];
client.batch(queries, { prepare: true }, function (err) {
// All queries have been executed successfully
// Or none of the changes have been applied, check err
});
The problem here is that they are atomic. I want other statements to be successful even if one of them fail. Is there something that I can do to achieve that ?
Upvotes: 1
Views: 671
Reputation: 2996
Batch statement across multiple partitions (which is the case with your write statements) are by default using LOGGED batch. This means that you have this atomicity property. If you really want to remove the atomicity part, you should use UNLOGGED batch. You should be aware, however, that UNLOGGED batch across multiple partitions is an anti-pattern https://issues.apache.org/jira/browse/CASSANDRA-9282. Let me try to explain:
When using batch statement, you have 4 possible cases:
Let's consider the 4 options:
To make it more concrete, when you issue what you call 'a single IO' batch statement across multiple partitions, the coordinator will have to slice your 'single IO' into 1000 of IO anyway (it wouldn't be the case if all the write were on the same partition), and coordinate that accross multiple replicas.
To conclude, you might observe a perf improvement on your client side, but you will induce a much larger cost at the Cassandra side.
You might want to read the following blog post: http://batey.info/cassandra-anti-pattern-misuse-of.html and in particular, the section cometing the use of UNLOGGED batch against multiple partitions:
What this is actually doing is putting a huge amount of pressure on a single coordinator. This is because the coordinator needs to forward each individual insert to the correct replicas. You're losing all the benefit of token aware load balancing policy as you're inserting different partitions in a single round trip to the database.
Upvotes: 4