chirag
chirag

Reputation: 218

Cassandra Pagination Using Datastax driver 3.6: Null paging state and fetch size not honoured

We are trying to make an Application that returns paginated results from cassandra db for a UI.

UI would pass fetchSize and pagingState to our API and based on that we would return a List<MyObject> of size=fetchSize. If pagingState is passed we would resume the query from last page (as mentioned in cassandra docs : https://docs.datastax.com/en/developer/java-driver/3.6/manual/paging/)

Please note that I'm using Cassandra driver version 3.6.

But when we implemented this, Cassandra always returns all entries in the database ignoring the fetch size, which in turn results null value for ResultSet.getExecutionInfo().getPagingState(). How do I solve this?

I created 16 records in my database for MyObject and tried passing fetch size as 5 to get them. All 16 records have same partition key ID-1.

// Util method to invoke Statement. "session" is cassandra session 

public static ResultSet execute(int pageSize, Statement statement, String pageState) { 
    if (isVoid(pageSize)) {
        pageSize=-1;
    }
    statement.setFetchSize(pageSize);
    if (!isVoid(pageState)) {
        statement.setPagingState(PagingState.fromString(pageState));
    }
    return session.execute(statement);
}

// Accesor interface method for my query that returns a Statement 
object

@Query("SELECT * FROM " + MY_TABLE + " WHERE id=:id")
Statement getAll(@Param("id") String id);

// Main Code returning list of MyObject that has an object Mapper -> 
//mapper 
Statement statement=accessor.getAll("ID1");
ResultSet rs=execute(5,statement,null );
List<MyObject> list=mapper.map(rs).all();
String pageState=rs.getExecutionInfo().getPagingState();

In the above code, I expected Cassandra to return a list of 5 MyObject objects and have a string value for my pageState variable. Neither worked as expected.

List had a size of 16 (Basically it fetched all records) and because of above, pageState was null as all records were already fetched.

What am I missing here?

EDIT: From observation ResultSet will honour fetchSize passed in the statement, but when we map it to List<MyObject> using all() method, it fetches all the results in the database(of size = Cluster wide fetchSize). So when I invoked Result#one method 5(= pageSize) times and pushed them in a List, I got the paging state as well as results of size page size.

Sample Util method for above

public static <T> List<T> getPaginatedList(ResultSet resultSet, Mapper<T> mapper,int pageSize) {
    List<T> entities=new ArrayList<>();
    Result<T> result=mapper.map(resultSet);
    IntStream.range(1,pageSize).forEach(i->{
        entities.add(result.one());
    });
    return entities;
}

What is the performance impact of this?

Upvotes: 2

Views: 1425

Answers (1)

Andy Tolbert
Andy Tolbert

Reputation: 11638

As you were able to discern, the reason you are getting all results back despite the fact that you are specifying setFetchSize is because fetch size simply sets the requested size of each requested page. When you invoke all(), the driver transparently pages through all results.

Calling one() individually will not have a performance impact when compared to all(), however I would recommend changing your logic for consuming the page as I would expect IntStream.range(1, pageSize) to fail if you've exhausted your result set (i.e. you set fetch size to 500, but there are only 495 rows). Instead you could use IntStream.range(1, resultSet.getAvailableWithoutFetching()).

You could also choose to iterate over the result set until ResultSet.isExhausted() returns true to prevent fetching the next page.

Upvotes: 2

Related Questions