Geoffrey De Vylder
Geoffrey De Vylder

Reputation: 4153

Bulk inserting existing data: Preventing JPA to do a select before every insert

I'm working on a Spring Boot application that uses JPA (Hibernate) for the persistence layer.

I'm currently implementing a migration functionality. We basically dump all the existing entities of the system into an XML file. This export includes ids of the entities as well.

The problem I'm having is located on the other side, reimporting the existing data. In this step the XML gets transformed to a Java object again and persisted to the database.

When trying to save the entity, I'm using the merge method of the EntityManager class, which works: everything is saved successfully.

However when I turn on the query logging of Hibernate I see that before every insert query, a select query is executed to see if an entity with that id already exists. This is because the entity already has an id that I provided.

I understand this behavior and it actually makes sense. I'm sure however that the ids will not exist so the select does not make sense for my case. I'm saving thousands of records so that means thousands of select queries on large tables which is slowing down the importing process drastically.

My question: Is there a way to turn this "checking if an entity exists before inserting" off?


Additional information:

When I use entityManager.persist() instead of merge, I get this exception:

org.hibernate.PersistentObjectException: detached entity passed to persist

To be able to use a supplied/provided id I use this id generator:

@Id
@GeneratedValue(generator = "use-id-or-generate")
@GenericGenerator(name = "use-id-or-generate", strategy = "be.stackoverflowexample.core.domain.UseIdOrGenerate")
@JsonIgnore
private String id;

The generator itself:

public class UseIdOrGenerate extends UUIDGenerator {

  private String entityName;

  @Override
  public void configure(Type type, Properties params, ServiceRegistry serviceRegistry) throws MappingException {
      entityName = params.getProperty(ENTITY_NAME);
      super.configure(type, params, serviceRegistry);
  }

  @Override
  public Serializable generate(SessionImplementor session, Object object) 
  {
        Serializable id = session
            .getEntityPersister(entityName, object)
            .getIdentifier(object, session);

      if (id == null) {
        return super.generate(session, object);
      } else {
        return id;
      }
  }
}

Upvotes: 3

Views: 5751

Answers (2)

Bertrand88
Bertrand88

Reputation: 739

I'm not sure I got whether you fill or not the ID. In the case you fill it on the application side, check the answer here. I copied it below:

Here is the code of Spring SimpleJpaRepository you are using by using Spring Data repository:

@Transactional
public <S extends T> S save(S entity) {

    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

It does the following:

By default Spring Data JPA inspects the identifier property of the given entity. If the identifier property is null, then the entity will be assumed as new, otherwise as not new.

Link to Spring Data documentation

And so if one of your entity has an ID field not null, Spring will make Hibernate do an update (and so a SELECT before).

You can override this behavior by the 2 ways listed in the same documentation. An easy way is it to make your Entity implement Persistable (instead of Serializable), which will make you implement the method "isNew".

Upvotes: 1

Maciej Kowalski
Maciej Kowalski

Reputation: 26522

If you are certain that you will never be updating any existing entry on the database and all the entities should be always freshly inserted, then I would go for the persist operation instead of a merge.

Per update

In that case (id field being set-up as autogenerated) the only way would be to remove the generation annotations from the id field and leave the configuration as:

@Id
@JsonIgnore
private String id;

So basically setting the id up for always being assigned manually. Then the persistence provider will consider your entity as transient even when the id is present.. meaning the persist would work and no extra selects would be generated.

Upvotes: 2

Related Questions