Hibernate pagination of sorted data with joins

Question

We currently are investigating ways to speed up our applications and a large part of that is related to a list (actually a table) of entities.

Parameters and requirements

The parameters and requirements for that list are the following (I'll try to only mention relevant ones here):

There can be up to 500,000 rows/entities in the list.
Only a couple of those are displayed at a time, we'll be using pagination here.
The user is able to select which columns are to be displayed in the list (so we can't provide a single "static" query).
Most of those list columns are sortable and/or filterable.
The entities have a couple of to-many relations which also provide some of the list columns.
Some of those list columns can contain multiple values (those are displayed as a list inside a single cell)
Any updates to the data originating from the current user's actions (e.g. editing, uploads etc.) should be reflected in the list as fast as possible (i.e. "immeditately")

To make the model a bit clearer consider the following simplified entity (as in JPA) model:

class Car {
  String manufacturer;
  String model;
  Date dateOfProduction;
  List allowedTyreSizes;
  Set inspectionDates;
}

Please don't try to put too much meaning into that model since it's just meant to illustrate the problem (our data is different and much more complex).

A "complete" list of cars could then look like this:

+==============+=======+=======+===============+=============+
| Manufacturer | Model | Prod. | Allowed Tyres | Inspections |
+==============+=======+=======+===============+=============+
| BMW          | 320d  |01/2016| - 225/40 R18  | - 01/07/16  |
|              |       |       | - 225/45 R17  | - 13/12/16  |
+--------------+-------+-------+---------------+-------------+
| Toyota       | Camry |09/2016| - 185/70 R13  | - 31/12/16  |
+--------------+-------+-------+---------------+-------------+

Since the user can select the columns to be displayed at runtime we're building the necessary queries dynamically. That's all working quite well so far.

Basic problem

The problem we're having is performance when sorting and filtering are involved: Our current approach is to load all data necessary to sort and filter into memory, do the sorting and filtering there and then keep a list of sorted ids and page on those. We are aware that this is somewhat slow but so far performance was good enough to satisfy our management. Things have changed though since we now have more data to operate on and performance requirements have gone up.

Thus we're investigating ways to improve sorting and filtering of all that data adn while we're currently following the approach of doing it on the database I'll still ask this (side) question:

How would pagination on a large number of rows with dynamic columns and potentially multiple values per cell be best approached?

Currently we're using Postgresql and would like to keep using it if possible but if a different storage would be far better suited we'd at least check that out.

Current approach and question(s) _{(at the bottom)}

As described above we're currently trying to have the database sort, filter and paginate our data. It would be ok to use 2 queries: one for getting the row ids for the current page and one to actually load the data for those rows.

Since the challenge is the first query I'll concentrate on that:

AFAIK we could do something like this in SQL (using the car example above):

SELECT DISTINCT id FROM (
  SELECT id, ... FROM car c
    LEFT OUTER JOIN allowedtyresizes ats ON c.id = ats.car_id
    LEFT OUTER JOIN tyresizes ts ON ts.id = ats.tyresize_id
    ... //additional joins if required
    ORDER BY ... //apply any user-defined sorts
    WHERE ... //apply any user-defined filters (or maybe put them into the joins)
  )
  OFFSET ... //page offset
  LIMIT ...  //page size

In theory this query (it might not be entirely correct though) should provide the result we need to identify which rows to load for the current page.

Since we're using Hibernate (5.2 atm) we'd like to use HQL or Criteria to achieve that. However it seems as if Hibernate doesn't support selecting from a select statement like above and thus this might not be a viable approach. If we'd have to fall back to native SQL or an entirely different approach so be it, but we'd prefer to make it work with the infrastructure that's currently in place.

So the questions are:

Does Hibernate 5.x support something like a "select from select"?
How could pagination using Hibernate be done when sorting and filtering has to be done on to-many relations, i.e. where a single join could result in duplicate rows?

Hibernate pagination of sorted data with joins

Answers (1)

Related Questions