Emmanuel Touzery
Emmanuel Touzery

Reputation: 9183

jpa sequence ID generation

There is tremendous value to be obtained by assigning an ID to an entity before it is persisted to database but immediately in the constructor: your equals/hashcode implementation becomes trivial, and it saves many headaches.

I have seen problems when the entity equality is based on ==: a proxy gets in the session and when it gets unwrapped as the real object, you get equals() that returns false.

And when you override equals and hashcode to use the generated ID, because that one is generated only on persist(), all non-persisted entities have id null and therefore are all equal to one another.

From what I read, when you use a traditional ID generation technique (autoincrement let's say), the ID is generated when the entity manager is flushed. When you use a sequence-based solution, it's generated at persist-time.

That article and my current understanding say the simplest solution is to assign an ID at creation-time, not persist or flush time. And with sequences that appears reachable, but JPA decided against it. With getting IDs being something cheap with sequences (as you can prefetch), how come JPA has not provided at least an option to obtain a sequence-based ID already at object construction time? There is a risk of wasting some IDs if the entity is not in fact persisted in the end, but I think it's not a big problem.

Barring that, the only "no compromises" as far as simplicity and understandability of the solution appear to be UUIDs, which have their own problems.

am I missing something? Is there maybe somewhere some JPA identity generator or some library that would be based on sequence and allow to give the ID at construction time?

Upvotes: 3

Views: 1056

Answers (1)

Vlad Mihalcea
Vlad Mihalcea

Reputation: 154130

Using an assigned identifier is the best approach from a writing perspective. It's also consistent across all entity state transitions and you can even batch multiple inserts at JDBC level.

When it comes to reading and indexing, a numeric column performs better and an assigned identifier is either unique logical key (Social Security Number) or a unique identifier (e.g. UUID). Using application-level unique assigned identifiers is complicated because you may have multiple application nodes (in a cluster) or you want to synchronize inserts from both within the application as well as from external sources (database client utility).

For database-assigned identifier you need to take into consideration how flushing is affected by your choice. Hibernate tries to defer the Persistence Context flushing up until the last possible moment. This strategy has been traditionally known as transactional write-behind.

The write-behind is more related to Hibernate flushing rather than any logical or physical transaction. During a transaction, the flush may occur multiple times.

The flushed changes are visible only for the current database transaction. Until the current transaction is committed, no change is visible by other concurrent transactions.

IDENTITY requires flushing, while a sequence is non-transactional, hence it doesn't require a flush. IDENTITY disables JDBC insert batching and it doesn't support pre-allocation.

JPA cannot assign the identifier at Entity construct time, because a new instance can only become persisted through an EntityManager.persist() call. JPA requires explicit "entity state transitions".

Wasting a sequence identifier is not much of a problem. The database performs just fine even with gaps in sequence values. Using a bigint column can guarantee you don't practically run out of sequence identifiers. It's better to have non-transactional sequence identifier allocation with occasional gaps than to have transactional allocation with a higher risk of dead-lock contention.

Upvotes: 2

Related Questions