Reputation: 5995
I'm running an import job that has worked pretty well until a couple days ago when the amount of entities has increased dramatically.
What happens is that I get a Lock wait timout exceeded. The application then retries and an exception is thrown since I call em.getTransaction().begin(); one more time.
To get rid of this problem I changed the innodb_lock_wait_timeout to 120 and lowered the batch side to 50 entities.
What I can't figure out is how to handle all of this properly in code. I don't want the entire import to fail because of locking. How would you handle this? Do you have any code example? Maybe some other thoughts? Please go nuts!
My BatchPersister:
public class BatchPersister implements Persister {
private final static Log log = getLog(BatchPersister.class);
private WorkLogger workLog = WorkLogger.instance();
private static final int BATCH_SIZE = 500;
private int persistedObjects;
private long startTime;
private UpdateBatch batch;
private String dataSource;
public BatchPersister(String dataSource) {
this.dataSource = dataSource;
}
public void persist(Persistable obj) {
persistedObjects++;
logProgress(100);
if (batch == null)
batch = new UpdateBatch(BATCH_SIZE, dataSource);
batch.add(obj);
if (batch.isFull()) {
batch.persist();
batch = null;
}
}
}
UpdateBatch
public class UpdateBatch {
private final static Log log = LogFactory.getLog(UpdateBatch.class);
private WorkLogger workLogger = WorkLogger.instance();
private final Map<Object, Persistable> batch;
private final EntityManager em;
private int size;
/**
* Initializes the batch and specifies its size.
*/
public UpdateBatch(int size, String dataSource) {
this.size = size;
batch = new LinkedHashMap<Object, Persistable>();
em = EmFactory.getEm(dataSource);
}
public void persist() {
log.info("Persisting " + this);
em.getTransaction().begin();
persistAllToDB();
em.getTransaction().commit();
WorkLog batchLog = new WorkLog(IMPORT_PERSIST, IN_PROGRESS);
batchLog.setAffectedItems(batch.size());
workLogger.log(batchLog);
em.close();
}
/**
* Persists all data in this update batch
*/
private void persistAllToDB() {
for (Persistable persistable : batch.values())
em.persist(persistable);
}
@Override
public String toString() {
final ArrayList<Persistable> values = new ArrayList<Persistable>(batch.values());
Persistable first = values.get(0);
Persistable last = values.get(values.size() - 1);
return "UpdateBatch[" +
first.getClass().getSimpleName() + "(" + first.getId() + ")" +
" - " +
last.getClass().getSimpleName() + "(" + last.getId() + ")" +
"]";
}
}
}
Upvotes: 3
Views: 214
Reputation: 9443
Pitor already named a couple of options. I would point out that a variation of his "Solution 2" would be to leverage the Hibernate StatelessSession api instead of using Session and clearing.
However, something else you should consider is that a transaction is a grouping of statements that that are expected to fail or succeed in total. If you have a bunch of statements and one in the middle fails and you want all the preceding statements to be persistent, then you should not be grouping them together in a single transaction. Group your statements properly in transactions. Generally it is a good idea to enable jdbc batching in Hibernate anyway; it generally leads to more efficient database communication.
Upvotes: 0
Reputation: 22672
Solution 1. Do not use JPA, it was not designed to work with massive database operations. Since you have access to your DataSource and you are managing transactions manually there is nothing that stops you from using plain old SQL.
Solution 2. There might be a performance problem connected with persistence context first level cache - every persisted entity is kept in that cache, when this cache gets large it may hurt performance (mostly memory)
To improve situation set hibernate.jdbc.batch_size property (or equivalent, if your are not using Hibernate implementation of JPA) to more or less 20 - thanks to that queries will be send to database in 20 queries packs.
Secondly clean persistence context every 20 operations, forcing synchronization with database.
private void persistAllToDB() {
int counter = 0;
for (Persistable persistable : batch.values())
em.persist(persistable);
counter++;
if(counter % 20 == 0){
em.flush();
em.clear();
}
}
}
Solution 3. Tune MySQL InnoDB engine [http://dev.mysql.com/doc/refman/5.1/en/insert-speed.html, http://dev.mysql.com/doc/refman/5.0/en/innodb-tuning.html]. If your table is heavily indexed it may hurt inserts performance.
That's my speculations, hope something would help you.
Upvotes: 1