Reputation: 612
I am using spring batch module to process csv files from a directory. Directory may contain multiple files with a specific extenstion and i am using MultiResourceItemReader to read the files. Job will receive 3 job parameters as read_from_directory, move_to_directory and a default_user_id. All these parameters will remain same for all the job runs. read_from_directory will contain multiple csv files and job should process these files one after another. The problem i am facing is since job parameters are same i am getting JobInstanceAlreadyCompleteException when the job is run second time. I am aware this problem can be overcome by using an additional timestamp parameter to make job parameters unique. But since adding timestamp parameter will make every job instance unique i don't wish to use this approach because it will create issues in making my job restartable. So i would like some suggestions on,
How can i make each job instance unique without using timestamp parameter.
How the job can be made restartable in this case? Will adding 'restartable="true"' suffice or will it take some additional configuration/coding on my part. I am little bit confused here because job will read multiple files from a directory. So if a job fails, for example, due to an incorrect record in one of the file how can i restart the same job from where it left of? I have configured the job to run periodically, after a certain time interval, using a scheduler. So if job fails and then i rectify error in the csv file, will job start from where it left off when it runs next time?
Please find below relevant part from my configuration:
<batch:job id="testJob" restartable="true">
<batch:step id="step1">
<batch:tasklet>
<batch:chunk reader="multiResourceItemReader" writer="fileWriter"
commit-interval="1">
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
<bean id="fileWriter" class="com.ivish.TestFileWriter" />
<bean id="multiResourceItemReader" class="org.springframework.batch.item.file.MultiResourceItemReader" scope="step">
<property name="resources" value="file:#{jobParameters['read_from_directory']}/*.csv" />
<property name="delegate" ref="fileReader" />
</bean>
<bean id="fileReader" class="com.ivish.TestFileReader" scope="step">
<property name="delegate" ref="delegateFileReader" />
<property name="moveToDirectory" value="#{jobParameters['move_to_directory']}" />
</bean>
<bean id="delegateFileReader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer" ref="fileTokenizer" />
<property name="fieldSetMapper">
<bean
class="org.springframework.batch.item.file.mapping.PassThroughFieldSetMapper" />
</property>
</bean>
</property>
</bean>
Thank you.
Upvotes: 0
Views: 1690
Reputation: 21463
Spring Batch had two distinct concepts related to job "runs", the JobInstance
and the JobExecution
.
The JobInstance
is the concept of a logical run. It is identified by a unique set of identifying job parameters. In your example, I'd expect one JobInstance
for each combination of read_from_directory, move_to_directory, and default_user_id.
The other concept is the JobExecution
. This represents a physical run. So for example, if you run the combination of read_from_directory, move_to_directory, and default_user_id and it passes, the JobInstance
would have one child JobExecution
. However, if the first attempt (first JobExecution
) were to fail, you could restart the job. The restarting would create a new JobExecution
under the existing JobInstance
(two physical runs under one logical run).
With the above in mind, each JobInstance
would be unique via the combination of the read_from_directory, move_to_directory, default_user_id, and a run id of some kind (Spring Batch provides a counter based one out of the box or you can use timestamps).
You can read more about the concepts of JobInstance
and JobExecution
in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/domain.html#domainJob
Upvotes: 1
Reputation: 3784
Question 1: You can implement JobParameterIncrementer
with your own logic what is for you next instance and how you want to increment parameters. Then you can run next instance of your job when based on your logic it is time to run next instance, otherwise with just run you will restart latest one. If you start job with CommandLineJobRunner
you can pass -next
to run next instance and if you do it programmatically you can use JobOperator#startNextInstance(String jobName)
. Here is example for JobParameterIncrementer
.
Question 2: For restartability adding restartable="true"
should do the trick. FlatFileItemReader
which is delegate for reading files extends AbstractItemCountingItemStreamItemReader
which saves state when reading files. As for MultiResourceItemReader
you can see on documentation that it says:
Input resources are ordered using setComparator(Comparator) to make sure resource ordering is preserved between job runs in restart scenario.
So it means that list of resources are ordered and each is delegated to FlatFileItemReader
which preserves order and count between runs.
Upvotes: 0