Sarthak Mittal
Sarthak Mittal

Reputation: 155

Implement Keyset paging in a reader for a spring batch job

I have a spring batch job that fetches the data from a Postgres database and after processing writes it on excel sheets. But I want to implement a keyset paging in a reader for a spring batch job. Currently, I am using JpaPagingItemReader which uses limit offset paging but because I am dealing with a large quantity of data, the query which the JpaPagingItemReader uses to fetch the data becomes inefficient as the offset increases. The keyset paging can be used to avoid the limitations of limit offset pagination but I don't know how to implement a reader with keyset pagination. How can I implement it?

Edit: Keyset pagination does not include the offsetting/skipping of records, instead, we will be ordering and tracking a numerical unique identifier in the result and requesting for entries greater than the last unique entry. In this method, the SQL would be something like below ( assuming customer_id is the unique auto-generated identifier for the records )

select * from CUSTOMERS where status = 'ACTIVE' and customer_id > 0 order by customer_id asc limit 100;

-- Second iteration ( size = 100, lastCustomerId = 100 ) 
select * from CUSTOMERS where status = 'ACTIVE' and customer_id > 100 order by customer_id asc limit 100;

-- Second iteration ( size = 100, lastCustomerId = 200 ) 
select * from CUSTOMERS where status = 'ACTIVE' and customer_id > 200 order by customer_id asc limit 100;

The points that need to be kept in mind while implementing keyset pagination are:

  1. There should be a numerical unique identifier for each record ( preferably a primary key )
  2. The result set should be ordered
  3. We should have the logic to sort and find the largest id in the retrieved list.
  4. There should be an index on the identifier field which you are using for peeking.
public class CustomerProcessorService {
    public void processCustomers() {
        List<Customer> customers = new ArrayList();
        long lastCusId = 0;
        int size = 100;
        while ( true ) {
            // Create a PageRequest object that will be passed as Pageable interface to repo
            // Note that here we are setting 0 as the offset
            PageRequest pageRequest = new PageRequest(0,size);
            // Get the lastCusId 
            lastCusId = getLastCusId(customers);
            // Get the data from the database
            customers = customerRepository.findByStatusAndCustomerIdGreaterThanOrderByCustomerIdAsc('ACTIVE',lastCusId,pageRequest);
            // Check if data is there
            if ( customers == null || customers.isEmpty()) {
                break;
            }
            // Do the processing
        } 
    }
    public Long getLastCusId(List<Customer> customers) {
        // If passed entry is null or empty, return 0 ( This handles the first iteration case ) 
        if ( customers == null || customers.isEmpty()) 
            return 0l;
        
        // Do the logic to sort the customers list by customer_id of each
        // Customer object
        // Return the last entry 
        return customers.get(customers.size() -1).getCustomerId();
    }

Upvotes: 1

Views: 1393

Answers (1)

Mahmoud Ben Hassine
Mahmoud Ben Hassine

Reputation: 31600

You should be able to implement your paging logic by extending the AbstractPaginatedDataItemReader class. This base class handles most of the paging boilerplate and allows you to specify the paging logic in doPageRead. Here is a quick example, I will let you adapt it accordingly:

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import org.springframework.batch.item.data.AbstractPaginatedDataItemReader;
import org.springframework.data.domain.PageRequest;

public class KeySetPagingItemReader extends AbstractPaginatedDataItemReader<Customer> {

    long lastCusId = 0;
    int size = 100;
    private CustomerRepository customerRepository;
    List<Customer> customers = new ArrayList();

    public KeySetPagingItemReader(CustomerRepository customerRepository) {
        this.customerRepository = customerRepository;
    }

    @Override
    protected Iterator<Customer> doPageRead() {
        PageRequest pageRequest = PageRequest.of(0,size);
        // Get the lastCusId
        lastCusId = getLastCusId(customers);
        // Get the data from the database
        customers = customerRepository.findByStatusAndCustomerIdGreaterThanOrderByCustomerIdAsc('ACTIVE',lastCusId,pageRequest);
        return customers.iterator();
    }

    public Long getLastCusId(List<Customer> customers) {
        // If passed entry is null or empty, return 0 ( This handles the first iteration case )
        if ( customers == null || customers.isEmpty())
            return 0l;

        // Do the logic to sort the customers list by customer_id of each
        // Customer object
        // Return the last entry
        return customers.get(customers.size() -1).getCustomerId();
    }
}

Upvotes: 4

Related Questions