Spring Batch - Read from DB - Transform - And write to file

Question

I am exploring Spring batch and I have a problem statement which require me to read from db, transform this data into comma separated form and write to a file. I have around 50 different queries and as many number of files to create. Few of these queries return huge data which could make my file large. I was solving this with spring batch and have few queries in general about spring batch.

Can a field extractor be used when I need to transform a particular field value.

BeanWrapperFieldExtractor extractor = new BeanWrapperFieldExtractor<>();
extractor.setNames(new String[] {"name", "emailAddress", "purchasedPackage"});
lineAggregator.setFieldExtractor(extractor);

for example, if i need to do something like studentDto.getName().replace("a",""). Should I go for a custom processor in such cases?

Is 1 job with 50 steps and parallel processing an apt way to go about in this scenario?
Writing header to the top of the file instead of using FlatFileHeaderCallback - Is the below way of writing to file acceptable?

@Override
public ExitStatus afterStep(StepExecution stepExecution) {
   if (stepExecution.getStatus() == "COMPLETED") {
   
      fileWriter.write("headerString");
      Path path = Paths.get("encryptedTextFileThreaded.txt");
      try (BufferedWriter fileWriter = Files.newBufferedWriter(path)) {
        for(Line line: studentDtoLines)
        {
          fileWriter.write(line.getLine());
          fileWriter.newLine();
        }
      
      fileWriter.write("footerString");
  }
  catch (Exception e) {
      log.error("Fatal error: error occurred while writing {} file",path.getFileName());
  }
}

Multi threaded steps are for speeding up a single step. If I have a Job with 50 steps and none of them steps depends on the other, then parallel processing can be employed to speed up the execution of Job. True? Does this mean spring batch can create 50 threads and run all of them in paralle?

Mahmoud Ben Hassine · Accepted Answer

Can a field extractor be used when I need to transform a particular field value. Should I go for a custom processor in such cases?

I would use a processor for data transformation. That's a typical use case for an item processor. It is a good practice to make each component do one thing (and do it well): the field extractor to extract fields and an item processor to do the transformation. This is better for testing and reusability.

Is 1 job with 50 steps and parallel processing an apt way to go about in this scenario?

IMO a job for each file is a better choice for restartability reasons. When a file processing fails, it is better (and cleaner) to restart the failed job for that specific file rather than the same job and skip 49 steps. You can always run multiple jobs in parallel by using an appropriate task executor on the JobLauncher.

Writing header to the top of the file instead of using FlatFileHeaderCallback - Is the below way of writing to file acceptable?

No, that's a wrong usage of a listener. I would use a header/footer callback for header/footer writing and a chunk oriented step to write the content of the file.

Multi threaded steps are for speeding up a single step. If I have a Job with 50 steps and none of them steps depends on the other, then parallel processing can be employed to speed up the execution of Job. True? Does this mean spring batch can create 50 threads and run all of them in paralle?

That's correct. The degree of parallelism is configurable in the TaskExecutor you set on the parallel flow. See Parallel steps for more details.

Spring Batch - Read from DB - Transform - And write to file

Answers (1)

Related Questions