ivish
ivish

Reputation: 612

spring batch - multiResourceItemReader: how to make job parameters unique and restart job

I am using spring batch module to process csv files from a directory. Directory may contain multiple files with a specific extenstion and i am using MultiResourceItemReader to read the files. Job will receive 3 job parameters as read_from_directory, move_to_directory and a default_user_id. All these parameters will remain same for all the job runs. read_from_directory will contain multiple csv files and job should process these files one after another. The problem i am facing is since job parameters are same i am getting JobInstanceAlreadyCompleteException when the job is run second time. I am aware this problem can be overcome by using an additional timestamp parameter to make job parameters unique. But since adding timestamp parameter will make every job instance unique i don't wish to use this approach because it will create issues in making my job restartable. So i would like some suggestions on,

  1. How can i make each job instance unique without using timestamp parameter.

  2. How the job can be made restartable in this case? Will adding 'restartable="true"' suffice or will it take some additional configuration/coding on my part. I am little bit confused here because job will read multiple files from a directory. So if a job fails, for example, due to an incorrect record in one of the file how can i restart the same job from where it left of? I have configured the job to run periodically, after a certain time interval, using a scheduler. So if job fails and then i rectify error in the csv file, will job start from where it left off when it runs next time?

Please find below relevant part from my configuration:

<batch:job id="testJob" restartable="true">
        <batch:step id="step1">
            <batch:tasklet>
                <batch:chunk reader="multiResourceItemReader" writer="fileWriter"
                    commit-interval="1">
                </batch:chunk>
            </batch:tasklet>
        </batch:step>
    </batch:job>

    <bean id="fileWriter" class="com.ivish.TestFileWriter" />
    <bean id="multiResourceItemReader" class="org.springframework.batch.item.file.MultiResourceItemReader" scope="step">
        <property name="resources" value="file:#{jobParameters['read_from_directory']}/*.csv" />
        <property name="delegate" ref="fileReader" />
    </bean>

    <bean id="fileReader" class="com.ivish.TestFileReader" scope="step">
        <property name="delegate" ref="delegateFileReader" />
        <property name="moveToDirectory" value="#{jobParameters['move_to_directory']}" />
    </bean>
    <bean id="delegateFileReader" class="org.springframework.batch.item.file.FlatFileItemReader">
        <property name="lineMapper">
            <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
                <property name="lineTokenizer" ref="fileTokenizer" />
                <property name="fieldSetMapper">
                    <bean
                        class="org.springframework.batch.item.file.mapping.PassThroughFieldSetMapper" />
                </property>
            </bean>
        </property>
    </bean>         

Thank you.

Upvotes: 0

Views: 1690

Answers (2)

Michael Minella
Michael Minella

Reputation: 21463

Spring Batch had two distinct concepts related to job "runs", the JobInstance and the JobExecution.

The JobInstance is the concept of a logical run. It is identified by a unique set of identifying job parameters. In your example, I'd expect one JobInstance for each combination of read_from_directory, move_to_directory, and default_user_id.

The other concept is the JobExecution. This represents a physical run. So for example, if you run the combination of read_from_directory, move_to_directory, and default_user_id and it passes, the JobInstance would have one child JobExecution. However, if the first attempt (first JobExecution) were to fail, you could restart the job. The restarting would create a new JobExecution under the existing JobInstance (two physical runs under one logical run).

With the above in mind, each JobInstance would be unique via the combination of the read_from_directory, move_to_directory, default_user_id, and a run id of some kind (Spring Batch provides a counter based one out of the box or you can use timestamps).

You can read more about the concepts of JobInstance and JobExecution in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/domain.html#domainJob

Upvotes: 1

Nenad Bozic
Nenad Bozic

Reputation: 3784

Question 1: You can implement JobParameterIncrementer with your own logic what is for you next instance and how you want to increment parameters. Then you can run next instance of your job when based on your logic it is time to run next instance, otherwise with just run you will restart latest one. If you start job with CommandLineJobRunner you can pass -next to run next instance and if you do it programmatically you can use JobOperator#startNextInstance(String jobName). Here is example for JobParameterIncrementer.

Question 2: For restartability adding restartable="true" should do the trick. FlatFileItemReader which is delegate for reading files extends AbstractItemCountingItemStreamItemReader which saves state when reading files. As for MultiResourceItemReader you can see on documentation that it says:

Input resources are ordered using setComparator(Comparator) to make sure resource ordering is preserved between job runs in restart scenario.

So it means that list of resources are ordered and each is delegated to FlatFileItemReader which preserves order and count between runs.

Upvotes: 0

Related Questions