Reputation: 88707
We currently are investigating ways to speed up our applications and a large part of that is related to a list (actually a table) of entities.
Parameters and requirements
The parameters and requirements for that list are the following (I'll try to only mention relevant ones here):
To make the model a bit clearer consider the following simplified entity (as in JPA) model:
class Car {
String manufacturer;
String model;
Date dateOfProduction;
List<TyreSize> allowedTyreSizes;
Set<Date> inspectionDates;
}
Please don't try to put too much meaning into that model since it's just meant to illustrate the problem (our data is different and much more complex).
A "complete" list of cars could then look like this:
+==============+=======+=======+===============+=============+
| Manufacturer | Model | Prod. | Allowed Tyres | Inspections |
+==============+=======+=======+===============+=============+
| BMW | 320d |01/2016| - 225/40 R18 | - 01/07/16 |
| | | | - 225/45 R17 | - 13/12/16 |
+--------------+-------+-------+---------------+-------------+
| Toyota | Camry |09/2016| - 185/70 R13 | - 31/12/16 |
+--------------+-------+-------+---------------+-------------+
Since the user can select the columns to be displayed at runtime we're building the necessary queries dynamically. That's all working quite well so far.
Basic problem
The problem we're having is performance when sorting and filtering are involved: Our current approach is to load all data necessary to sort and filter into memory, do the sorting and filtering there and then keep a list of sorted ids and page on those. We are aware that this is somewhat slow but so far performance was good enough to satisfy our management. Things have changed though since we now have more data to operate on and performance requirements have gone up.
Thus we're investigating ways to improve sorting and filtering of all that data adn while we're currently following the approach of doing it on the database I'll still ask this (side) question:
Currently we're using Postgresql and would like to keep using it if possible but if a different storage would be far better suited we'd at least check that out.
Current approach and question(s) (at the bottom)
As described above we're currently trying to have the database sort, filter and paginate our data. It would be ok to use 2 queries: one for getting the row ids for the current page and one to actually load the data for those rows.
Since the challenge is the first query I'll concentrate on that:
AFAIK we could do something like this in SQL (using the car example above):
SELECT DISTINCT id FROM (
SELECT id, ... FROM car c
LEFT OUTER JOIN allowedtyresizes ats ON c.id = ats.car_id
LEFT OUTER JOIN tyresizes ts ON ts.id = ats.tyresize_id
... //additional joins if required
ORDER BY ... //apply any user-defined sorts
WHERE ... //apply any user-defined filters (or maybe put them into the joins)
)
OFFSET ... //page offset
LIMIT ... //page size
In theory this query (it might not be entirely correct though) should provide the result we need to identify which rows to load for the current page.
Since we're using Hibernate (5.2 atm) we'd like to use HQL or Criteria to achieve that. However it seems as if Hibernate doesn't support selecting from a select statement like above and thus this might not be a viable approach. If we'd have to fall back to native SQL or an entirely different approach so be it, but we'd prefer to make it work with the infrastructure that's currently in place.
So the questions are:
Upvotes: 3
Views: 1555
Reputation: 21103
I had a similar requirement at my past employer and we noticed similarly that under smaller datasets the database was more than capable of doing it; however, there will be a tipping point where even the database will suffer.
My solution was to introduce Hibernate Search and its integration with ElasticSearch to store the search data in a NoSQL Lucene datastore, which is absolutely fast for unicode based text queries and sorting, as you've described.
This allows you to continue to use the Hibernate ORM infrastructure you already have and bake into your architecture these additoinal components with minimal effort. The integration is seamless and certainly worth the investment, especially as your dataset continues to grow with time.
We were working with 10s of millions of rows and had absolutely no performance issues. A query doing sorting and paging took on average less than 100 milliseconds.
Upvotes: 2