Reputation: 31
Dear fellow programmers,
i have been given the task to update about 10 000 - 100 000 records in an Oracle 11g database EACH minute. The current state of those records are held in a global ArrayList so i don't need to SELECT all records on every update from the DB. A scheduler updates those records in the ArrayList at the beginning of each minute and then starts to update the records in the database.
I cannot change this fact, it is a customer requirement. To achieve high performance, those updates should be done by using the native batch update feature.
I am using a TomEE plume 7.0.2 application server with EclipseLink 2.6.3 (this version is included with TomEE).
Code:
@PersistenceContext(unitName = "MES_Tables")
private EntityManager em;
...
@Schedule(second="0", minute="*", hour="*", persistent=false)
public void startUpdate(){
Query q = em.createNativeQuery(
"UPDATE " +
"SCHEMA.PROPERTIES_GRP_CONT " +
"SET " +
"STRVAL = ? " + //<-- SQL-Param
"WHERE " +
"STATES_ID = 1 " +
"AND PROPERTIES_ID = ? " + //<-- SQL-Param
"AND PROPERTIES_GRP_ID = ?"); //<-- SQL-Param
for(BatchInfo bi : biList){
int rowsUpdated = q
.setParameter(1, Long.toString(bi.getLifetime()))
.setParameter(2, bi.getPropertiesId())
.setParameter(3, bi.getBatchId())
.executeUpdate();
}
}
Unfortunately those updates are executed as single updates and no batching is happening. So 10 000 updates are taking about 40-50 seconds. To my understanding the EntityManager (em) should automatically create batch updates if you execute multiple updates within a single for each loop. Even simplifying the SQL UPDATE to a statement without any parameters, so that always the same update is executed, did not change the fact that single updates were executed.
persistence.xml
<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.1"
xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/persistence http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
<persistence-unit name="MES_Tables" transaction-type="JTA">
<jta-data-source>MES_Connection</jta-data-source>
<exclude-unlisted-classes>false</exclude-unlisted-classes>
<properties>
<property name="javax.persistence.schema-generation.database.action" value="none" />
<property name="eclipselink.ddl-generation" value="none" />
<property name="eclipselink.logging.level" value="WARNING" />
<property name="eclipselink.logging.level.sql" value="FINE" />
<property name="eclipselink.logging.parameters" value="true" />
<property name="javax.persistence.query.timeout" value="1800000" />
<property name="eclipselink.jdbc.connections.wait-timeout" value="1800000" />
<property name="eclipselink.jdbc.batch-writing" value="JDBC" />
<property name="eclipselink.jdbc.batch-writing.size" value="600" />
<property name="eclipselink.logging.logger" value="mes.core.logging.EclipseLinkLogger"/>
</properties>
</persistence-unit>
</persistence>
To test if batch updating is working at all, i refactored the code to use a managed JPA entity instead of the native SQL UPDATE. The problem here is, that i need to perform a em.merge(entity) on each entity for it to be managed again. This is because the entities become unmanaged after committing (which is happening each minute in the scheduler).
This causes 10 000 slow SELECTs (30-40 seconds). After those SELECTs are finished, EclipseLink performs a fast batch update (3-4 seconds).
The last days i was trying to prevent EclipseLink from performing those SELECTs and just issue the update but without luck. From another stackoverflow post i found a method to do updates without the SELECT:
Perform UPDATE without SELECT in eclipselink
EntityManagerImpl emImpl = ((EntityManagerImpl) em.getDelegate());
UnitOfWork uow = emImpl.getUnitOfWork();
AbstractSession as = uow.getParent();
for(BatchInfo bi : biList)
as.updateObject(bi);
This unfortunately did not work also because of the following exception: org.eclipse.persistence.internal.sessions.IsolatedClientSession cannot be cast to org.eclipse.persistence.internal.sessions.UnitOfWorkImpl
I am out of options now and hopefully someone of you can give me a hint where to look at and solve this problem. It would be greatly appreciated.
I would rather have the native batch update working than the manipulating EclipseLink to not perform any SELECTs on merge.
Upvotes: 0
Views: 2740
Reputation: 31
After searching for a long time and trying different approaches (thanks to Chris) i found the simplest solution if you want to stay on the native side of JPA:
@Schedule(second="0", minute="*", hour="*", persistent=false)
public void startUpdate(){
String updateSql =
"UPDATE " +
"SCHEMA.PROPERTIES_GRP_CONT " +
"SET " +
"STRVAL = ? " + //<-- SQL-Param
"WHERE " +
"STATES_ID = 1 " +
"AND PROPERTIES_ID = ? " + //<-- SQL-Param
"AND PROPERTIES_GRP_ID = ?"; //<-- SQL-Param
java.sql.Connection connection = em.unwrap(java.sql.Connection.class);
PreparedStatement prepStatement = connection.prepareStatement(updateSql);
for(BatchInfo bi : biList){
prepStatement.setString(1, Long.toString(bi.getLifetime()));
prepStatement.setLong(2, bi.getPropertiesId());
prepStatement.setLong(3, bi.getBatchId());
prepStatement.addBatch();
}
prepStatement.executeBatch();
}
Warning: the important part (em.unwrap) may be EclipseLink specific and require JPA 2.1 or higher!
Upvotes: 3