Robert Bowen
Robert Bowen

Reputation: 487

Spring Batch: How to get errors for all lines read?

I am using FlatFileItemReader to read a file. I plug in the DefaultLineMapper and my own custom FieldSetMapper (myMapper).

Currently in myMapper, when an error occurs I simply log it. I would like to accumulate all errors, for all lines in the file, and then save them to a file.

I was considering implementing my own Tasklet. But from what I've read, it is recommended to only do this if your step is not doing chunk-oriented processing.

Another option is to use either ItemListenerSupport or ItemReadListener and implement the onReadError() method. But if I do that, I'm not sure how I could have access to a global/shared object that holds a list of all errors, for all lines.

I've been going back and forth between these 2 options trying to get them to work, without much success. Any advice much appreciated.

***** EDIT *****

My code isn't anything non-standard I don't think. I define the errors log Job Param:

Map<String, JobParameter> jobParametersMap ...
jobParametersMap.put("errorsFile", new JobParameter(errorsFileURI));

My xml config looks like this:

<job ...>
  <step ...>
  <step id="import">
    <tasklet>
      <chunk reader="importReader" writer="importWriter" .../>
    </tasklet>
  </step>
</job>

<bean id="importReader" class="MyImportReader" scope="step">
  <property name="resource" .../>
  <property name="lineMapper">
    <bean class = "...DefaultLineMapper">
      ...
      <property name="fieldSetMapper" ref="importMapper"/>
    </bean>
  </property>
  <property name="errorsFile" value="#jobParameters['errorsFile']}"/>
</bean>

<bean id="importWriter" ...scope="step">
  ...
  <property name="errorsFile" value="#jobParameters['errorsFile']}"/>
</bean>

The Reader class extends FlatFileItemReader and implements ItemReadListener. The writer implements BatchLoadableWriter and StepExecutionListener.

As you can see I pass the errorsFile to both the Reader and the Writer. The Writer has used the errorsFile for some time, whereas I just added it to the Reader. Both classes have a getter/setter for errorsFile.

The difference between them is that in the Writer, the @Overridden write() method validates and then writes all Items in the file. So all errors are written to the errorsFile at once. Also, if there are errors, a flag is set (hasErrors), and that flag's value is checked in the @Overridden afterStep() method. If it is true, ExitStatus.FAILED is returned.

Whereas with the Reader, the doRead() method is called once for each Item. If there is an error, I can write it to the errorsFile, and I could set a flag like the Writer does. But the flag will be set only for that line/Item.

So let's say I import 10 lines. The first 5 have errors, the last 5 don't. When afterRead() is called, it will check the value of the flag for the last processed Item, which had no errors, so hasErrors will be false. Not good. Or perhaps it would be better to override onReadError(). But what would cause that method to be called, an error in the Mapper?

Something tells me implementing my own Reader, and/or having it implement ItemReadListener might not be the way to go about this. To me it seems I need to put some or all of this logic in the Reader's "parent" ... which would be ... a Tasklet? But I've read on SO and elsewhere on the net that implementing your own Tasklet to perform chunk processing isn't recommended; it should only be done for simple tasks.

I'm at a loss ...

Upvotes: 0

Views: 3141

Answers (2)

Robert Bowen
Robert Bowen

Reputation: 487

Just following up with this issue in case it can help someone else down the road.

In the end I was able to do what I wanted by implementing a custom LineMapper and in that class' mapLine(String line, int lineNumber) method, save the lineNumber to the executionContext:

public class MyLineMapper implements LineMapper<MyPojo>,
  InitializingBean, StepExecutionListener {

  private ExecutionContext _executionContext;

  public MyPojo mapLine(String line, int lineNumber)
    throws Exception {

  _executionContext.put("lineNumber", lineNumber);

  MyPojo myPojo = fieldSetMapper.mapFieldSet(tokenizer.tokenize(line));
  return myPojo;
}

Since I need access to the ExecutionContext, I made the class also implement StepExecutionListener.

Then in my custom FieldMapper, I also implement StepExecutionListener, so I can grab the lineNumber from ExecutionContext, and use it to log errors with line number:

public class MyFieldMapper implements LineMapper<MyPojo>,
  InitializingBean, StepExecutionListener {

  private ExecutionContext _executionContext;

  @Override
  public MyPojo mapFieldSet(final FieldSet fieldSet)
    throws BindException {

    String currentLineNumber =
      (_executionContext.get("lineNumber") != null) ? String
      .valueOf(_executionContext.get("lineNumber")) : "-";

    if (some kind of error) {
      logError(currentLineNumber, errorMsg);

I then check for the existence of the errorFile in the beforeWrite() method of my Writer. If it exists, that means some kind of error ocurred while reading/validating, and I throw an Exception.

This way I can log all read/validation errors, for all lines of my csv file, and not exit and stop processing when the 1st error occurs.

Hope this helps someone else someday!

Upvotes: 1

emeraldjava
emeraldjava

Reputation: 11212

I think you should consider using the step and job scope. From your reader you can save the error details to these scopes and then reference the information at a later stage. I'd be careful about recording too much information here.

http://docs.spring.io/spring-batch/reference/html/configureStep.html#step-scope

You at the start of the job, generate and name an error file and save it to the job/step scope. If your Reader has an error, it can write details to the file. At the end of the process, you still have a reference to the error file name, with the recorded details.

Upvotes: 0

Related Questions