kupsef
kupsef

Reputation: 3367

JPA persist becomes slower and slower

This scenario uses a simple oneToMany relationship with cascade persist on both directions.

Many:

@javax.persistence.Entity(name="Many")
public class Many {
    @javax.persistence.ManyToOne(cascade = CascadeType.PERSIST)
    protected One one;

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private long primaryKey;

    public void setM(One one) {
        this.one = one;
        // comment out this line and performance becomes stable
        this.one.getMany().add(this);
    }

    // other setters, getters, etc...
}

One:

@javax.persistence.Entity(name="One")
public class One {
    @javax.persistence.OneToMany(mappedBy="m", cascade = CascadeType.PERSIST)
    protected java.util.Set<Many> many = com.google.common.collect.Sets.newHashSet();

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private long primaryKey;

    private String name;

    // setters, getters, etc... 
}

Test:

public static void main(String[] args) {
    while(true) {
        EntityManagerFactory emf = Persistence.createEntityManagerFactory("test-pu");
        EntityManager em = emf.createEntityManager();

        for (int i = 0; i < 100; i++) {
            sw.reset();
            sw.start();
            persistMVs(emf, em);
            System.err.println("Elapsed: " + sw.elapsed(TimeUnit.MILLISECONDS) + " ms");
        }

        em.close();
        emf.close();
    }
}

private static void persistMVs(EntityManagerFactory emf, EntityManager em) {
    em.getTransaction().begin();
    One one = getOrCreateOne(em);

    for (int i = 0; i < 200; i++) {
        Many many = new Many();
        many.setM(one);
        em.persist(many);
    }
    em.getTransaction().commit();
}

The test is an endless loop which tries to insert 20000 Many entities associated with a single One entity. Each loop begins with the creation of a new EntityManagerFactory to show the negative performance effect of the increasing database.

The expected behavior would be that, the insertion time of the entities does not increase drastically, however after each WHILE CYCLE there is an order of magnitude increase.

Notes:

Why would the initial size of the database matter in this case? Should I consider this behavior as a bug?

Upvotes: 2

Views: 5103

Answers (2)

Chris
Chris

Reputation: 21190

Just to expand on Predrag's answer - traversing a 1:M relationship not only has the cost of bringing in the entities and any expands the object graph, but those entities remain managed within the persistent unit. Because your test is reusing the same EntityManager for repeated transactions, the cache of managed entities continues to grow with each iteration. This cache of managed entities must be traversed and checked for changes every time the context is synchronized with the database - this occurs on flush, transaction commit or even queries.

If you must bring in large object graphs, what can be done to mitigate this is either release and obtain new EntityManagers for each transactional boundary, or occasionally flush and clear the EntityManager. Either option allows it to release some of the managed entities, so it does not need to check them all for changes on each commit.

Edit> Your "Many" class has overriden the hashCode method and is building its hashcode using the hashcode of its referenced "One" with its primary key. This causes each and every "Many" you persist in your loops to have the same hashcode, as GenerationType.IDENTITY can only assign sequences when the insert statement occurs - which happens during synchronization (flush/commit). This method might be causing cache lookups, which occur while the provider traverses the growing object model on each persist call due to the cascade persist call, to take longer and longer.

Upvotes: 4

Predrag Maric
Predrag Maric

Reputation: 24433

I think the problem is in this.one.getMany(), because in each iteration more and more entities need to be loaded from this relationship.

@OneToMany relation is lazy by default, so when you call getMany() JPA provider has to initialize every entity form the collection, which takes more time as the size of it grows.

If you don't create a new EntityManagerFactory in each iteration, the entities from the last iteration remain in cache so a lot less queries are executed.

Upvotes: 2

Related Questions