Relacks
Relacks

Reputation: 21

Hibernate-Search flushToIndexes causing java.lang.OutOfMemoryError (heap space)

I've a Spring application, using Hibernate, and connected to Elasticsearch through Hibernate-Search.
To simplify the example, I'll put only required annotations and code.

I've an entity A, contained in multiple B entities (a lot, actually ~8000).
The B entities also contains a lot of embedded details (entities C, E, ...).
Those entities are all connected with @IndexedEmbedded and @ContainedIn Hibernate-Search annotations (see the example below).
I've created a service, modifying a field of A object, and forcing the flush through flushToIndexes.

On the flush, Hibernate-Search updates A index, and because of the @ContainedIn, propagates on the 8000 B indexes. But to update B indexes, for some reason, Hibernate-Search loads every 8000 B objects linked to the A object at one time, and also every details contained in thoses B objects (C, E, and so on).
All this takes a long time, and ends on nothing more than java.lang.OutOfMemoryError: Java heap space.


@Entity
@Table(name = "A")
@Indexed
public class A {

    @ContainedIn 
    @OneToMany(fetch = FetchType.LAZY, mappedBy = "a") 
    private Set<B> bCollection;

    @Field
    @Column(name = "SOME_FIELD")
    private String someField;                            // Value updated in the service
}

@Entity
@Table(name = "B")
@Indexed
public class B {

    @IndexedEmbedded
    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "A_ID")
    private A a;

    @IndexedEmbedded
    @OneToOne(fetch = FetchType.LAZY, mappedBy = "b")
    @Fetch(FetchMode.JOIN)  
    private C c;                                         // Some other details

    @IndexedEmbedded
    @OneToMany(fetch = FetchType.LAZY, mappedBy = "b")
    private Set<E> eCollection;                          // Some other details
}

// My service
aObject.setSomeField("some value");
fullTextSession.flushToIndexes();

Increasing JVM allocated memory (from 8GB to 24 GB, which is actually a lot for ~10000 objects) didn't solve anything. So I presume the loading of the whole dataset requires more than 24 GB...

However, the problem seems more complicated than it looks ~
Is that a bug ? Is that common ? What did I do wrong ? How could I solve that ?
Is there some hidden Hibernate-Search configuration, to avoid this behaviour ?

Upvotes: 2

Views: 290

Answers (1)

yrodiere
yrodiere

Reputation: 9977

It is a limitation of Hibernate Search. @ContainedIn will perform relatively well only for small associations; large ones such as yours will indeed trigger the loading of all associated entities and will perform badly, or in the worst cases trigger OOM.

It hasn't been fixed yet because the problem is rather complex. We would need to use queries instead of associations for the @ContainedIn (HSEARCH-1937), which would be rather simple. But more importantly we would need to perform chunking (periodical flush/clear), which would either have side-effect on the user session or be performed outside of the user transaction (HSEARCH-2364), both of which may have nasty consequences.

The work around would be to not add the @ContainedIn on A.bCollection, and handle reindexing manually: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#manual-index-changes

Similarly to what I mentioned in another answer, you can adopt one of two strategies:

  1. The easy path: reindex all the B entities periodically using the mass indexer, e.g. every night.
  2. The hard path: whenever an A changes, save the information "this entity changed" somewhere (this could be as simple as storing a "last update date/time" on entity A, or adding a row in an event table). In parallel, have a periodic process inspect the changes, load the affected entities of type B, and reindex them. Preferably do that in batches of manageable size, one transaction per batch if you can (that will avoid some headaches).

The first solution is fairly simple, but has the big disadvantage that the Person index will be up to 24 hours out of date. Depending on your use case, that may be ok or that may not. It also may not be feasible if you have many entities of type B (read: millions) and full reindexing takes more than just a few minutes.

The second solution is prone to errors and you would basically be doing Hibernate Search's work, but it would work even for very large tables, and the delay between the database change and the reindexing would be much shorter.

Upvotes: 1

Related Questions