How to read parquet results from S3 which are paginated

Question

My results are stored in Amazon S3 in parquet format.

My Requirements are as follows :

I have a S3 bucket where I store my result as parquet (multiple parquet parts). I want to retrieve the results in all the parts.
I want to retrieve all rows (in all the parts) as they are. (Doing query would be nice)
My desire to paginate comes from my environment which is non distributed. I have an EC2 instance that has java code to get the results. I need the results to be paginated so that the EC2 host does not crash while retrieving the result.

Options I looked into:

ListObjectsV2Request - can't use this yet because we have not upgraded to AWS Java SDK 2.0
Looking into S3 Select - Since S3 select needs the exact key of the contents I want to retrieve, first I will have to list all the parts from S3 and then use S3 Select on each part to get the results. Also I am not sure how I will paginate the input stream provided by S3
Also looking into Read parquet data from AWS s3 bucket but I am not clear on how to paginate the results.

Any input/help will be highly appreciated.

Answers (1)