Reputation: 379
So this my maiden question so here it goes... The idea is I am doing a large number of update statements to a database table. In sql it would be a simple
. Since there are millions of these it is best thought to batch some of these together. I followed directions here:
update table_name set col1 = 123 where col2 = 456 and col1 is null
http://docs.jboss.org/hibernate/orm/3.3/reference/en-US/html/batch.html
and a few other pages I've randomly found here on stackoverflow and other places but they all read similar.
My idea was to not do a read but do just the update directly and keep looping like this:
sessionFactory = new Configuration().configure("finaldetail/hibernate.dev.cfg.xml")
.addPackage("com.xxx.app.ftm.domain.event").addAnnotatedClass(FinalTrainDetail.class)
.addAnnotatedClass(AbstractDetail.class).addAnnotatedClassFinalTrainDetailWork.class).buildSessionFactory();
inputStream = new BufferedReader(new FileReader(new File(args[0])));
session = sessionFactory.openSession();
transaction = session.beginTransaction();
String s;
int count = 0;
while ((s = inputStream.readLine()) != null) {
Query query = session.createQuery("update FinalTrainDetail detail set detail.trainSummary "
+ "=:summaryId where detail.trainDetail=:detailId and detail.trainSummary=null");
query.setParameter("summaryId", new Long(s.substring(9, 18)));
query.setParameter("detailId", new Long(s.substring(0, 9)));
query.executeUpdate();
count++;
if (count % 20 == 0) {
log.debug("should commit");
session.flush();
session.clear();
}
}
transaction.commit();
System.out.println("exit");
} catch (IOException e) {
transaction.rollback();
log.error(e.toString());
} catch (Throwable t) {
System.out.print(t);
log.error("exception caught during Updateing Offline", t);
System.exit(2);
} finally {
if (inputStream != null)
inputStream.close();
session.close();
}
So the understanding here is that flush will cause every 20 updates to commit and then the clear empties first level cache to avoid the OutOfMemory exception.
So far a config I have
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE hibernate-configuration PUBLIC
"-//Hibernate/Hibernate Configuration DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">
<hibernate-configuration>
<session-factory>
<!-- SQL dialect -->
<property name="dialect">org.hibernate.dialect.Oracle10gDialect</property>
<!-- Database connection settings -->
<property name="connection.driver_class">oracle.jdbc.driver.OracleDriver</property>
<property name="connection.url">jdbc:oracle:thin:@dev264.oracle.XXXX.com:1521:DEV264</property>
<property name="connection.username">XXXX</property>
<property name="connection.password">XXX</property>
<property name="connection.shutdown">true</property>
<!-- JDBC connection pool (use the built-in one) -->
<property name="connection.pool_size">1</property>
<!-- Enable Hibernate's automatic session context management -->
<property name="current_session_context_class">thread</property>
<!-- Disable the second-level cache -->
<property
name="cache.provider_class">org.hibernate.cache.NoCacheProvider</property>
<!-- disable batching so HSQLDB will propagate errors correctly. -->
<property name="jdbc.batch_size">20</property>
<!-- Echo all executed SQL to stdout -->
<property name="show_sql">true</property>
</session-factory>
</hibernate-configuration>
Show sql is enabled for debugging purposes. So what I don't get or am not seeing is when I set in log4j
<logger name="org.hibernate.transaction">
<level value="debug"/>
<appender-ref ref="file"/>
<appender-ref ref="console"/>
</logger>
I only see
[DEBUG] [main] [org.hibernate.transaction.JDBCTransaction] [commit]
[DEBUG] [main] [org.hibernate.transaction.JDBCTransaction] [commit]
[DEBUG] [main] [org.hibernate.transaction.JDBCTransaction] [commit]
at the very end of the log file and not occurring when the 'flush' is occurring. So what I am wondering is whether the commit is really being called every 20 records and whether I am building up too many objects in memory and will I get OutOfMemory in production when this runs being that I don't have hundreds of thousands of test records.
Upvotes: 1
Views: 996
Reputation: 379
I would have commented on your answer above but word count is too high...
OK so at least you didn't send me the link to this question being asked before so that makes me feel slightly better but I think I see your point.
So this verison of the app was like revision 3 and there will soon be are revision 4. Point taken about don't keep recreating the query. My original version did a read (into memory) then did called a setter (changing object in memory). Somewhere I got the idea, skip doing the select and just do the update. So if a did a read then changed state, flushing and clearing would become necessary. Since there was no read, there is nothing to clear or flush. So there is no way I can run out of memory on first level cache. My real concern here was in Oracle using too much program global area (PGA) memory and taken up too much undo space. I read about that here:
http://www.oracle.com/technetwork/issue-archive/2008/08-mar/o28plsql-095155.html
So every 20 to 100 updates, don't call flush on the session but commit on the transaction. Plus I should monitor the number of rows being changed from the executeUpdate call not just counting up everytime the query gets executed. So I wind up with something like this:
Query query = session.createQuery("update FinalTrainDetail detail set detail.trainSummary "
+ "=:summaryId where detail.trainDetail=:detailId and detail.trainSummary=null");
while ((s = inputStream.readLine()) != null) {
transaction = session.beginTransaction();
query.setParameter("summaryId", new Long(s.substring(9, 18)));
query.setParameter("detailId", new Long(s.substring(0, 9)));
count+=query.executeUpdate();
if (count % 100 == 0) {
log.debug("should commit");
transaction.commit();
}
}
transaction.commit();
System.out.println("exit");
Upvotes: 0
Reputation: 692231
You're confusing flush()
and commit()
. flush()
doesn't commit the transaction. All it does is execute update and delete statements to write, in database, the changes that have been applied in memory, on attached entities, and that haven't been made persistent yet.
In your case, flushing and clearing the session is useless (but harmless) since you don't apply any change in memory, and the session is thus always empty.
AFAIK, creating a new query at each iteration is also useless. You could reuse the same query again and again. And detail.trainSummary=null
is incorrect. It should be detail.trainSummary is null
.
Upvotes: 1