Reputation: 966
I have a table that contains about 500M records in Cassandra. I need to pass all these records and do some processing then insert the processed ones into other tables. Due to memory issues, I need to retrieve these records as chunks. Is there a way to do that? For example, In the first chunk I need to get the first 2M, then in the second one I need to retrieve the second 2M and so on.
Upvotes: 4
Views: 1339
Reputation: 9953
If you just run a regular execute
method you get back a ResultSet
. As noted in the docs:
The retrieval of the rows of a ResultSet is generally paged (a first page of result is fetched and the next one is only fetched once all the results of the first one has been consumed). The size of the pages can be configured either globally through QueryOptions.setFetchSize(int) or per-statement with Statement.setFetchSize(int). Though new pages are automatically (and transparently) fetched when needed, it is possible to force the retrieval of the next page early through fetchMoreResults().
So you can just run a query asking for all the data and set the fetch size to your chunk size. Then iterate through your ResultSet
until you've got your chunk of records, process, insert, then start iterating again.
Upvotes: 3