Andrew Regan
Andrew Regan

Reputation: 5113

Spring Batch memory leak with FlowJob, repeating Steps

We've set up a Spring Batch batch email-sender job that does the following:

  1. Read incoming email details from a JMS Queue.
  2. Don't do any processing, just pass them through...
  3. Turn the details into emails and "write" them to SMTP
  4. Repeat ad infinitum

This generally works fine, but we've discovered a massive memory leak. After 9 hours or so, can see a single FlowJob, but 748 JobExecution (all 'STARTED'), each holding 778 (!) StepExecution instances. All in all, 900 MB of stuff.

Here's the Spring config (Spring 3.1, Spring Batch 2.1.9):

<bean id="jobRepository"
    class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean" />

<bean id="jobLauncher"
    class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
    <property name="jobRepository" ref="jobRepository" />
    <property name="taskExecutor">
        <bean class="org.springframework.core.task.SimpleAsyncTaskExecutor"/>
    </property>
</bean>

<bean id="jmsEmailFetcher" class="org.springframework.batch.item.jms.JmsItemReader">
    <property name="jmsTemplate" ref="batchEmailJmsTemplate" />
</bean>

<bean id="passthroughProcessor" class="org.springframework.batch.item.support.PassThroughItemProcessor" />

<bean id="transactionManager"
    class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />

<!-- The Spring Batch *Limiter/Decider* -->

<bean id="ourLimitDecider" class="our.special.InfiniteThrottledExecutionDecider" />

<!-- The Spring Batch *Job* -->

<batch:job id="fetchEmailsJobBatch" restartable="true">
    <batch:step id="fetchEmailsStep" next="limitDecision">
        <batch:tasklet throttle-limit="10">
            <batch:chunk reader="jmsEmailFetcher" processor="passthroughProcessor"
                         writer="batchEmailService.javaMailItemWriter" commit-interval="100" skip-limit="999999999">  <!-- Might gets *lots* of MQ-not-up-yet fails -->
                <batch:skippable-exception-classes>
                    <batch:include class="org.springframework.jms.JmsException" />       <!-- Generally MQ-not-up-yet ConnectionException, or Session-closed (in tests) -->
                    <batch:include class="java.lang.IllegalStateException" />            <!-- Yuk, usually/presumably a test SMTP server isn't yet connected -->
                    <batch:include class="org.springframework.mail.MailSendException" /> <!-- SMTP down... -->
                </batch:skippable-exception-classes>
            </batch:chunk>
        </batch:tasklet>
    </batch:step>
    <batch:decision id="limitDecision" decider="ourLimitDecider">
        <batch:next on="CONTINUE" to="fetchEmailsStep" />
        <batch:end on="COMPLETED" />
    </batch:decision>
</batch:job>

Our InfiniteThrottledExecutionDecider basically returns new FlowExecutionStatus("CONTINUE") every time, to ensure that the fetchEmailsStep gets executed at the end of the flow, and that the Job never completes - at least not until we're ready to stop it ourselves.

We're not using a database, so we expect some stuff to be held in memory, but not a complete record of everything that's ever been run...

Is there something wrong with our configuration? Or our approach?


Here's a bit from our log file for what we're told is Step #778 and what should be the one and only Job instance.

23:58:18,782 - INFO  (org.springframework.batch.core.job.SimpleStepHandler) - Duplicate step [fetchEmailsStep] detected in execution of job=[fetchEmailsJobBatch]. If either step fails, both will be executed again on restart.
23:59:52,257 - INFO  (org.springframework.batch.core.job.SimpleStepHandler) - Executing step: [fetchEmailsStep]
23:59:52,257 - DEBUG (org.springframework.batch.core.step.AbstractStep) - Executing: id=778
23:59:52,259 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Starting repeat context.
23:59:52,259 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Repeat operation about to start at count=1
23:59:52,259 - DEBUG (org.springframework.batch.core.scope.context.StepContextRepeatCallback) - Preparing chunk execution for StepContext: org.springframework.batch.core.scope.context.StepContext@1be1ee
23:59:52,259 - DEBUG (org.springframework.batch.core.scope.context.StepContextRepeatCallback) - Chunk execution starting: queue size=0
23:59:52,259 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Starting repeat context.
23:59:52,259 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Repeat operation about to start at count=1
... 5 second JMS timeout ...
23:59:57,716 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Repeat is complete according to policy and result value.
23:59:57,716 - DEBUG (org.springframework.batch.core.step.item.ChunkOrientedTasklet) - Inputs not busy, ended: true
23:59:57,716 - DEBUG (org.springframework.batch.core.step.tasklet.TaskletStep) - Applying contribution: [StepContribution: read=0, written=0, filtered=0, readSkips=0, writeSkips=0, processSkips=0, exitStatus=EXECUTING]
23:59:57,719 - DEBUG (org.springframework.batch.core.step.tasklet.TaskletStep) - Saving step execution before commit: StepExecution: id=778, version=1, name=fetchEmailsStep, status=STARTED, exitStatus=EXECUTING, readCount=0, filterCount=0, writeCount=0 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=1, rollbackCount=0, exitDescription=
23:59:57,721 - DEBUG (org.springframework.batch.repeat.support.RepeatTemplate) - Repeat is complete according to policy and result value.
23:59:57,721 - DEBUG (org.springframework.batch.core.step.AbstractStep) - Step execution success: id=778
23:59:57,722 - DEBUG (org.springframework.batch.core.step.AbstractStep) - Step execution complete: StepExecution: id=778, version=3, name=fetchEmailsStep, status=COMPLETED, exitStatus=COMPLETED, readCount=0, filterCount=0, writeCount=0 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=1, rollbackCount=0
23:59:57,723 - DEBUG (org.springframework.batch.core.job.flow.support.SimpleFlow) - Completed state=fetchEmailsJobBatch.fetchEmailsStep with status=COMPLETED
23:59:57,723 - DEBUG (org.springframework.batch.core.job.flow.support.SimpleFlow) - Handling state=fetchEmailsJobBatch.limitDecision100
23:59:57,723 - DEBUG (org.springframework.batch.core.job.flow.support.SimpleFlow) - Completed state=fetchEmailsJobBatch.limitDecision100 with status=CONTINUE
23:59:57,723 - DEBUG (org.springframework.batch.core.job.flow.support.SimpleFlow) - Handling state=fetchEmailsJobBatch.fetchEmailsStep

Problem is, the heap dump shows 748 JobExecutions, and 581,911 StepExecutions. Where are they all coming from?

Upvotes: 2

Views: 6065

Answers (1)

Andrew Regan
Andrew Regan

Reputation: 5113

OK, in answer to my question, the solution was to drop the Step Flow and make "fetchEmailsStep" run forever.

We achieved that by removing our JmsTemplate ("batchEmailJmsTemplate") 's "readTimeout", and subclassing org.springframework.batch.item.jms.JmsItemReader to remove the check it does to stop you using infinite JmsTemplates - and add extra logging. If you don't want the override, you could set the "readTimeout" to a huge number.

More discussion here: http://forum.springsource.org/showthread.php?132468-Massive-memory-use-with-FlowJob-repeating-Steps&p=431279

I still don't know why the memory usage was quite as huge as it was...


[Edit] I was also using a JobExecutionDecider ("our.special.InfiniteThrottledExecutionDecider") to achieve throttling (based on the <StepExecution>.writeCount), but this had to go. We're now achieving the same result using a new class that extends StepExecutionListenerSupport (or you can implement StepExecutionListener). An instance of this bean is added as a <batch:listener> to our single <batch:step>

Upvotes: 1

Related Questions