Michal
Michal

Reputation: 2039

Cassandra pagination: start from a given / random position?

Is it possible to start pagination from a specified or random position?

Why do I need this?

On my production nodes I have a couple of parallel service jobs that iterate over circa 200 000 000 items and update information for them. New versions of software are often pushed to servers and with each push, service jobs are restarted. So all jobs start from beginning again and again. Of course I use locks, but it would be better, if I could instruct those parallel job to start from different pages.

Upvotes: 1

Views: 1052

Answers (1)

Andy Tolbert
Andy Tolbert

Reputation: 11638

Paging is done via Apache Cassandra and client drivers by communicating a pagingState as described in section 8 of the native protocol specification:

However, if some results are not part of the first response, the Has_more_pages flag will be set and the result will contain a paging_state value. In that case, the paging_state value should be used in a QUERY or EXECUTE message (that has the same query as the original one or the behavior is undefined) to retrieve the next page of results.

As you query data, this paging state can be accessed and stored for later use, in a way like you describe in starting a job from a previous position.

This can be accomplished using the DataStax java-driver as described in the manual on the 'Paging' page under the 'Saving and Reusing the paging state' section:

The driver exposes a PagingState object that represents where we were in the result set when the last page was fetched:

ResultSet resultSet = session.execute("your query");
// iterate the result set...
PagingState pagingState = resultSet.getExecutionInfo().getPagingState();

This object can be serialized to a String or a byte array:

String string = pagingState.toString();
byte[] bytes = pagingState.toBytes();

This serialized form can be saved in some form of persistent storage to be reused later. In our web service example, we would probably save the string version as a query parameter in the URL to the next page (http://myservice.com/results?page=<...>). When that value is retrieved later, we can deserialize it and reinject it in a statement:

PagingState pagingState = PagingState.fromString(string);
Statement st = new SimpleStatement("your query");
st.setPagingState(pagingState);
ResultSet rs = session.execute(st);

Other drivers should have a similar mechanism for paging in this way.

Upvotes: 3

Related Questions