how to update data in cassandra using IN operator

Question

I have a table with the following schema.

CREATE TABLE IF NOT EXISTS group_friends(
groupId timeuuid,
friendId bigint,
time bigint,
PRIMARY KEY(groupId,friendId));

I need to keep a track of time if any changes happen in a group (such changing the group name or adding a new friend in table etc.). So I need to update the value of time field by groupId every time there is any change in any related table.

As update in cassandra requires mentioning all primary keys in where clause this query will not run.

update group_friends set time = 123456 where groupId = 100;

So I can do something like this.

update group_friends set time=123456 where groupId=100 and friendId in (...);

But it is showing the following error-->

[Invalid query] message="Invalid operator IN for PRIMARY KEY part friendid"

Is there any way to perform an update operation using IN operator in clustering column? If not then what are the possible ways to do this?

Thanks in advance.

Andy Tolbert · Accepted Answer

Since friendId is a clustering column, a batch operation is probably a reasonable and well performing choice in this case since all updates would be made in the same partition (assuming you are using the same group id for the update). For example, with the java driver you could do the following:

Cluster cluster = new Cluster.Builder().addContactPoint("127.0.0.1").build();
Session session = cluster.connect("friends");

PreparedStatement updateStmt = session.prepare("update group_friends set time = ? where groupId = ? and friendId = ?");

long time = 123456;
UUID groupId = UUIDs.startOf(0);
List friends = Lists.newArrayList(1L, 2L, 4L, 8L, 22L, 1002L);
BatchStatement batch = new BatchStatement(BatchStatement.Type.UNLOGGED);
for(Long friendId : friends) {
    batch.add(updateStmt.bind(time, groupId, friendId));
}
session.execute(batch);
cluster.close();

The other advantage of this is that since the partition key can be inferred from the BatchStatement, the driver will use token-aware routing to send a request to a replica that would own this data, skipping a network hop.

Although this will effectively be a single write, be careful with the size of your batches. You should take care not to make it too large.

In the general case, you can't really go wrong by executing each statement individually instead of using a batch. The CQL transport allows many requests on a single connection and are asynchronous in nature, so you can have many requests going on at a time without the typical performance cost of a request per connection.

For more about writing data in batch see: Cassandra: Batch loading without the Batch keyword

Alternatively, there may be an even easier way to accomplish what you want. If what you are really trying to accomplish is to maintain a group update time and you want it to be the same for all friends in the group, you can make time a static column. This is a new feature in Cassandra 2.0.6. What this does is shares the column value for all rows in the groupId partition. This way you would only have to update time once, you could even set the time in the query you use to add a friend to the group so it's done as one write operation.

CREATE TABLE IF NOT EXISTS friends.group_friends(
  groupId timeuuid,
  friendId bigint,
  time bigint static,
  PRIMARY KEY(groupId,friendId)
);

If you can't use Cassandra 2.0.6+ just yet, you can create a separate table called group_metadata that maintains the time for a group, i.e.:

CREATE TABLE IF NOT EXISTS friends.group_metadata(
  groupId timeuuid,
  time bigint,
  PRIMARY KEY(groupId)
);

The downside here being that whenever you want to get at this data you need to select from this table, but that seems manageable.

how to update data in cassandra using IN operator

Answers (1)

Related Questions