Josh
Josh

Reputation: 1

How to handle JPA repository 'findby' with large data sets

I have a method in my JPA repository class that looks like this:

List<Claim> findAllByClaimNumberIn(Set<String> claimNumbers);

Everything works perfectly if the set of claim numbers being passed in is less than or equal to 1000. However if the number of claim numbers being passed in exceeds 1000, I get an error complaining that "maximum number of expressions in a list is 1000". Is there an easy workaround? Pagination doesn't work since the set of claim numbers is still greater than 1000. The only solution I can think of is to break the set of claim numbers into chunks of 1000 and query the database multiple times. Any suggestions?

Upvotes: 0

Views: 1460

Answers (1)

hjoeren
hjoeren

Reputation: 589

This is a database matter, so: One way is to split the original set into chunks, but:

You don't have to query the database X times. Instead, you can build one "big (dynamic) query" in the form:

SELECT claim.*
FROM claim
WHERE
    claim.claimNumber IN (<chunk_1>) OR
    claim.claimNumber IN (<chunk_2>) OR
    -- ...
    claim.claimNumber IN (<chunk_N>)

To do this, you can create and use a specification (see https://spring.io/blog/2011/04/26/advanced-spring-data-jpa-specifications-and-querydsl/) to build the query dynamically:

(I changes Set to List for easier partitioning of the list)

public class ClaimSpecs {

  public static Specification<Claim> claimNumberSpec(List<String> claimNumbers) {
    return new Specification<Claim>() {

      @Override
      public Predicate toPredicate(Root<Claim> root, CriteriaQuery<?> query,
          CriteriaBuilder criteriaBuilder) {
        if (claimNumbers.size() <= 1000) {
          // 1.
          return root.get("claimNumber").in(claimNumbers);
        } else {
          // 2.1.
          final int CHUNK_SIZE = 1000;
          List<List<String>> chunks = new ArrayList<>();
          for (int i = 0; i < claimNumbers.size(); i += CHUNK_SIZE) {
            chunks.add(claimNumbers.subList(i, Math.min(claimNumbers.size(), i + CHUNK_SIZE)));
          }
          // 2.2.
          Predicate predicate = criteriaBuilder.conjunction();
          for (List<String> chunk : chunks) {
            predicate = criteriaBuilder.or(predicate, root.get("claimNumber").in(chunk));
          }
          return predicate;
        }
      }
    };
  }

}
  1. If the list size of claim numbers is lesser or equal 1000, everything is fine and only one predicate is sufficient.

  2. Otherwise

2.1. the list of claim numbers has to be partitioned to sub lists of size 1000 (or smaller) (taken from https://stackoverflow.com/a/2895365/2201165) and

2.2. multiple predicates has to be created based on the created chunks and concatenated with OR.


To use this specification (as parameter in the repository methods), your repository has to extend JpaSpecificationExecutor and looks like this:

public interface ClaimRepository
    extends JpaRepository<Claim, Long>, JpaSpecificationExecutor<Claim> {

}

Given that, you are able to call this repository like this:

@GetMapping                                                                 
public List<Claim> findAllClaimsByClaimNumberIn(List<String> claimNumbers) {
  return claimRepository.findAll(ClaimSpecs.claimNumberSpec(claimNumbers)); 
}                                                                           

Upvotes: 1

Related Questions