Tristate
Tristate

Reputation: 1831

How do I update one column of all rows in a large table in my Spring Boot application?

I have a Spring Boot 2.x project with a big Table in my Cassandra Database. In my Liquibase Migration Class, I need to replace a value from one column in all rows.

For me its a big perfomance hit, when I try to solve this with

SELECT * FROM BOOKING
forEach Row
  Update Row

Because of the total number of rows. Even when I select only 1 Column.

Is it possible to make something like "partwise/pagination" loop?

Pseudecode

Take first 1000 rows
  do Update
Take next 1000 rows
 do Update

loop.

Im also happy about all other solution approaches you have.

Upvotes: 0

Views: 203

Answers (1)

clunven
clunven

Reputation: 1695

Must known:

  • Make sure there is a way to group the updates by partition. If you try a batchUpdate on 1000 rows not in same partition the coordinator of the request will suffer, you are moving the load from your client to the coordinator, and you want the parallelize the writes instead. A batchUpdate with cassandra has nothing to do with the one in relational databases.

  • For fined-grained operations like this you want to go back to the usage of the drivers with CassandraOperations and CqlSession for maximum control

  • There is a way to paginate with Spring Data cassandra using Slice but do not have control over how operations are implemented.

Spring Data Cassandra core

Slice<MyEntity> slice = MyEntityRepo.findAll(CassandraPageRequest.first(size));
while(slice.hasNext() && currpage < page) {
  slice = personrepo.findAll(slice.nextPageable());
  currpage++;
}
slice.getContent();

Drivers:

// Prepare Statements to speed up queries
PreparedStatement selectPS = session.prepare(QueryBuilder
  .selectFrom( "myEntity").all()
  .build()
  .setPageSize(1000)                           // 1000 per pages
  .setTimeout(Duration.ofSeconds(10));         // 10s timeout
PreparedStatement updatePS = session.prepare(QueryBuilder
   .update("mytable")
   .setColumn("myColumn", QueryBuilder.bindMarker())
   .whereColumn("myPK").isEqualTo(QueryBuilder.bindMarker())
   .build()
   .setConsistencyLevel(ConsistencyLevel.ONE)); // Fast writes

// Paginate
ResultSet page1 = session.execute(selectPS);
Iterator<Row> page1Iter = page1.iterator();
while (0 <  page1.getAvailableWithoutFetching()) {
  Row row = page1Iter.next();
  cqlsession.executeAsync(updatePS.bind(...));
}

ByteBuffer pagingStateAsBytes = 
page1.getExecutionInfo().getPagingState();
selectPS.setPagingState(pagingStateAsBytes);
ResultSet page2 = session.execute(selectPS);

You could of course include this pagination in a loop and track progress.

Upvotes: 1

Related Questions