Tu Doan
Tu Doan

Reputation: 51

Spring JPA - Hibernate: Batch insert execute too much select nextval (‘sequence’)

Now I'm trying to enhance performance of my web application, I use spring JPA 2.3.0- Hibernate 5.4.15.Final, Postgres 12 and manage transaction by @Transaction. The web app is deployed on aws beanstalk, run multiple instances at the same time, but the database instance is not scalable. And I use bigSerial type for ID of tables.

For instance, I have a STUDENTS table, ID is bigSerial and some other columns. I got a problems when using

@GeneratedValue(strategy = GenerationType.IDENTITY)

, Hibernate couldn't batch insert when saving a list of entities. And I try to use

@GeneratedValue(strategy = GenerationType.AUTO, generator = "students_id_seq") 
@SequenceGenerator(name = "students_id_seq", sequenceName = "students_id_seq")

hibernate.id.new_generator_mappings=false
hibernate.jdbc.batch_size=10 
hibernate.order_inserts=true 
hibernate.order_updates=true
hibernate.batch_versioned_data=true

It seem Hibernate could batch insert, but the problem is Hibernate execute select nextval ('students_id_seq') multiple times. If an entity list has 30 items, Hibernate execute select nextval 30 times, and 3 times for batch insert query.

Some statistics:

In conclusion, If using GenerationType.AUTO or GenerationType.SEQUENCE with allocationSize = 1:


My question is, is there anyway to batch insert but not execute to many select nextval query ? Something likes GenerationType.IDENTITY, not execute select nextval, just batch insert and IDs will be handled by sequence in the database.

When I test with GenerationType.SEQUENCE and allocationSize=1 (GenerationType.AUTO), the application executes too much select nextval query, I think It is even worse than the IDENTITY strategy. And for some reasons, I don't want to use allocationSize, it may lead to duplicate primary key error when run insert query manual or when migrate data or some other cases.

After some research, I found a way to get a value list of a sequence:

select nextval ('students_id_seq') from generate_series(1,10);

We can replace 10 by entityList.size() or number of entities doesn't have ID in the entityList when batch insert, just get enough to use, don't create too much gap between IDs, but I'm not sure whether or not Hibernate supported, if yes, please share me the documentation to reference.

Thank you

https://discourse.hibernate.org/t/batch-insert-execute-too-much-select-nextval-sequence/4232

Upvotes: 5

Views: 6139

Answers (3)

ayman.mostafa
ayman.mostafa

Reputation: 619

I think you have to make the "allocationSize" setting in your Java entity the same as the sequence increment setting in the DB and the same as the batch size setting.
In that case, hibernate will call the sequence's next value once per batch (in the case above, once per every 40 records), generate the next 40 id values (without the need to call the DB), populate the entity with these new ID(s) then save the batch in one shot

Upvotes: 0

ayman.mostafa
ayman.mostafa

Reputation: 619

I would say that I have some experience of that point. I was doing insertion for more than 128,000 records. And my target was to enhance the time-consuming to do that. I would try to summarize the case as below :

  1. The code did not use any persist() or save() methods. These records were saved when the @Transactionl method exits

  2. I was using hibernate batch inserts below are the settings in the config map

    spring.jpa.properties.hibernate.jdbc.batch_size: "40" spring.jpa.properties.hibernate.order_inserts: "true" spring.jpa.properties.hibernate.order_updates: "true" spring.main.allow-bean-definition-overriding: "true"

  3. I modified the allocation size in my entity Id configuration as below:

@Id

@GeneratedValue(strategy = GenerationType.SEQUENCE, generator ="mappingentry_sql_generator")

@SequenceGenerator(name = "mappingentry_sql_generator", sequenceName ="mappingentry_id_seq", allocationSize = 40)

private Long id;

Notes: I set the "allocationSize" in the sequence generator to be equal to the "batch_size" value in the setting

  1. Also, I altered my sequence "mappingentry_id_seq" to increment by 40 as well

After doing these changes the time was reduced from 55 seconds to 20 seconds which was a great impact

The only thing that I did not understand is that when I checked the value of the id column that was generated by the sequence I did not find any value gap. Every ID value exceeded the previous by 1, not 40. That is what I am currently trying to understand

Upvotes: 0

Jens Schauder
Jens Schauder

Reputation: 81950

What you are looking for is the HiLo algorithm for id generation.

For every id generated from a sequence it generates a number of ids on the client without accessing the database.

You configure it on your entity as this:

    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "hilo_sequence_generator")
    @GenericGenerator(
            name = "hilo_sequence_generator",
            strategy = "org.hibernate.id.enhanced.SequenceStyleGenerator",
            parameters = {
                    @Parameter(name = "sequence_name", value = "hilo_seqeunce"),
                    @Parameter(name = "initial_value", value = "1"),
                    @Parameter(name = "increment_size", value = "3"),
                    @Parameter(name = "optimizer", value = "hilo")
            })
    @Id
    private Long id;

Upvotes: 2

Related Questions