super.t
super.t

Reputation: 2736

Spring Batch memory leak - CSV to database using JpaItemWriter

I had a problem with a Spring Batch job for reading a large CSV file (a few million records) and saving the records from it to a database. The job uses FlatFileItemReader for reading the CSV and JpaItemWriter for writing read and processed records to the database. The problem is that JpaItemWriter doesn't clear the persistence context after flushing another chunk of items to the database and the job ends up with OutOfMemoryError.

I have solved the problem by extending JpaItemWriter and overriding the write method so that it calls EntityManager.clear() after writing a bunch, but I was wondering whether Spring Batch addresses this issue already and the root of the problem is in the job config. How to address this issue the right way?

My solution:

class ClearingJpaItemWriter<T> extends JpaItemWriter<T> {

        private EntityManagerFactory entityManagerFactory;

        @Override
        public void write(List<? extends T> items) {
            super.write(items);
            EntityManager entityManager = EntityManagerFactoryUtils.getTransactionalEntityManager(entityManagerFactory);

            if (entityManager == null) {
                throw new DataAccessResourceFailureException("Unable to obtain a transactional EntityManager");
            }

            entityManager.clear();
        }

        @Override
        public void setEntityManagerFactory(EntityManagerFactory entityManagerFactory) {
            super.setEntityManagerFactory(entityManagerFactory);
            this.entityManagerFactory = entityManagerFactory;
        }
    }

You can see the added entityManager.clear(); in the write method.

Job config:

@Bean
public JpaItemWriter postgresWriter() {
    JpaItemWriter writer = new ClearingJpaItemWriter();
    writer.setEntityManagerFactory(pgEntityManagerFactory);
    return writer;
}

@Bean
    public Step appontmentInitStep(JpaItemWriter<Appointment> writer, FlatFileItemReader<Appointment> reader) {
        return stepBuilderFactory.get("initEclinicAppointments")
                .transactionManager(platformTransactionManager)
                .<Appointment, Appointment>chunk(5000)
                .reader(reader)
                .writer(writer)
                .faultTolerant()
                .skipLimit(1000)
                .skip(FlatFileParseException.class)
                .build();
    }

@Bean
    public Job appointmentInitJob(@Qualifier("initEclinicAppointments") Step step) {
        return jobBuilderFactory.get(JOB_NAME)
                .incrementer(new RunIdIncrementer())
                .preventRestart()
                .start(step)
                .build();
    }

Upvotes: 0

Views: 1523

Answers (1)

Mahmoud Ben Hassine
Mahmoud Ben Hassine

Reputation: 31600

That's a valid point. The JpaItemWriter (and HibernateItemWriter) used to clear the persistent context but this has been removed in BATCH-1635 (Here is the commit that removed it). However, this has been re-added and made configurable in the HibernateItemWriter in BATCH-1759 through the clearSession parameter (See this commit) but not in the JpaItemWriter.

So I suggest to open an issue against Spring Batch to add the same option to the JpaItemWriter as well in order to clear the persistence context after writing items (This would be consistent with the HibernateItemWriter).

That's said, to answer your question, you can indeed use a custom writer to clear the persistence context as you did.

Hope this helps.

Upvotes: 2

Related Questions