Reputation: 105
Hi we are using hibernate-search along with elasticsearch.
Indexing works as expected however we are seeing strange behaviour when paginating results.
org.hibernate.Query hibQuery =
fullTextSession.createFullTextQuery(query,
Person.class).setFirstResult(0).setMaxResults(10);
return hibQuery.list();
If we leave out the setFirstResult(0).setMaxResults(10) we get 700 results but with the two parameters set we get back 0 results.
Further research shows that the problem is in this section of code in QueryLoader in hibernate-search
objectInitializer.initializeObjects(
entityInfos,
idToObjectMap,
new ObjectInitializationContext( criteria, entityType, extendedIntegrator, timeoutManager, session )
);
ArrayList<Object> result = new ArrayList<>( idToObjectMap.size() );
for ( Object o : idToObjectMap.values() ) {
if ( o != ObjectInitializer.ENTITY_NOT_YET_INITIALIZED ) {
result.add( o );
}
}
return result;
in the above code the line
if ( o != ObjectInitializer.ENTITY_NOT_YET_INITIALIZED )
Is returning false for all the idToObjectMap entries
Further research shows that hibernate builds the query and the sql looks correct but in the QueryParanters object callable is set to false and the query is never executed.
Relevant libs
compile "org.hibernate:hibernate-core:5.9.2.Final"
compile "org.hibernate:hibernate-search-orm:5.9.2.Final"
compile "org.hibernate:hibernate-search-elasticsearch:5.9.2.Final"
Any help with explaining why this happens and how to implement pagination correctly would be greatly appreciated.
Upvotes: 0
Views: 384
Reputation: 9977
This generally happens when entities are present in the index, but not in the database (anymore). In your case the first 10 results appear to be in your index, but not in your database.
The cause for this behavior is that Elasticsearch is "near real-time": after we make change to the index, the changes will take some time (generally a few seconds) until they are visible in search results. So if you just deleted entities a few miliseconds before, the index state could "lag" behind the database state.
If you are certain the entities still exist in the database, there might be a problem with your ID mapping, or with the specific query configuration you picked. Please show us the code of the Person
class and give us the value you set to the properties hibernate.search.query.object_lookup_method
and hibernate.search.query.database_retrieval_method
, if you are not using the defaults.
If this is a problem when testing, you can set hibernate.search.default.elasticsearch.refresh_after_write
to true
. You should not set this in production, though, as this will dramatically decrease the performance of indexing.
If this is a problem in production, and you need to solve it efficiently, it will be more difficult. The only solution I can think of is moving from pagination by index to pagination by key. However, you will lose the ability to go to a page directly, and you will not be able to sort the results any way you want.
You will need to find a strictly monotonic key in your results, i.e. a field that is guaranteed to be unique for each result and to always increase (or always decrease) when you go to the next result. An id would be a good candidate, if you sort by id. A creation date could work too, if it's precise enough and you sort by this creation date.
You will use this key to ignore the previous pages: the client will not send the page number to the server, it will send the last value for the "strictly monotonic" key, and you will just add a predicate like this to your query: queryBuilder.range().onField("myKey").above(<the last value for the key in the previous page>).createQuery()
.
Then instead of returning the results of your query directly, you will instead execute the query multiple times, accumulating the results in a list until it reaches the appropriate page size (or until getResultSize
returns 0).
EDIT: Another solution, perhaps easier, but that will just reduce the likelihood of this problem, not remove it completely.
You could ensure that Elasticsearch refreshes its indexes more frequently by setting index.refresh_interval
to something shorter than the default (1s
) for all indexes. Note this could have a very bad impact on the performance of your Elasticsearch cluster depending on how often you write to the cluster.
In order to apply the setting to all indexes, the easiest solution is to create index templates before Hibernate Search creates the indexes.
Upvotes: 1