Tristate
Tristate

Reputation: 1821

Would iteration over Cassandra rows with LIMIT and OFFSET have any unexpected side effects?

My Project has a huge Cassadra Table. My Cassadra running with 5 nodes in a kubernetes. As Backend I use Spring Boot 2.X.

I try to update values in my entire Table. Because of the size of the table I do not use a "SELECT * FROM TABLE" query.

Instead I think about to use "limit" with "offset"

String sql = "SELECT * FROM " + tableName + " LIMIT " + limit + " OFFSET " + offset;

With recursive call

private boolean migrateBookingTable(Database database, Statement statement, String tableName, int limit, int offset) throws SQLException, LiquibaseException {
    String sql = "SELECT * FROM " + tableName + " LIMIT " + limit + " OFFSET " + offset;
    try (ResultSet resultSet = statement.executeQuery(sql)) {
        //if resultSet is empty, we are done
        if (!resultSet.isBeforeFirst()) {
            return false;
        }
        while (resultSet.next()) {
         //some logic
        }
        database.execute...
    }

    return migrateBookingTable(database, statement, tableName, limit, offset+limit);
}

I tested it on a small test environment. Its worked. But because of cassandra peculiarities and the fact of 5 Nodes on production. Im not sure about side effects.

Is this an "ok" way to go?

Upvotes: 0

Views: 2157

Answers (1)

clunven
clunven

Reputation: 1685

OFFSET is not part of CQL language, not sure how you tested.

cqlsh:ks1> select * from user LIMIT 1 OFFSET 1;
SyntaxException: line 1:27 mismatched input 'OFFSET' expecting EOF (...* from user LIMIT 1 [OFFSET]...)

Because of the size of the table I do not use a "SELECT * FROM TABLE" query.

Without the awful findAll() allowed by Spring Data Cassandra every request is paged with Cassandra. Why not going with the default behaviour of Paging with the cassandra drivers.

SimpleStatement statement = QueryBuilder.selectFrom(USER_TABLENAME).all().build()
 .setPageSize(10)                    // 10 per pages
 .setTimeout(Duration.ofSeconds(1))  // 1s timeout
 .setConsistencyLevel(ConsistencyLevel.ONE);
 ResultSet page1 = session.execute(statement);
        
 LOGGER.info("+ Page 1 has {} items", 
 page1.getAvailableWithoutFetching());
 Iterator<Row> page1Iter = page1.iterator();
 while (0 <  page1.getAvailableWithoutFetching()) {
   LOGGER.info("Page1: " + page1Iter.next().getString(USER_EMAIL));
 }

 // Getting ready for page2
 ByteBuffer pagingStateAsBytes = page1.getExecutionInfo().getPagingState();
 statement.setPagingState(pagingStateAsBytes);
 ResultSet page2 = session.execute(statement);

Spring Data Also allow paging with Slice

Upvotes: 1

Related Questions