guilhermerama
guilhermerama

Reputation: 750

Tips to improve Simple Spring Batch Job Performance

I am working with a spring batch application for the first time and since the framework is way too flexible, I have a few questions on performance and best practices implementing jobs which I couldn't find clear answers in the spring docs.

My Goals:

  1. read an ASCII file with fixed column length values sent by a third-party with previously specified layout (STEP 1 reader)

  2. validate the read values and register (Log file) the errors (custom messages)

  3. Apply some business logic on the processor to filter any undesirable lines (STEP 1 processor)

  4. write the valid lines on oracle database (STEP 1 writer)

  5. After the execution of the previous step update a table on the database with the the step 1 finish timestamp (STEP 2 tasklet)

  6. Send an email when the job is stopped with a summary of the quantities already processed, errors and written lines, start time and finish time (Are These informations on the jobRepository meta-data?)

Assumptions:

  1. The file is incremental, so the third party always sends the prior file lines (possible with some values changes) and any new lines (~120Million lines on total). A new file is sent every 6 months.
  2. we must validate if input file lines while processing (Are required values present? Some can be converted to number and Dates?)
  3. The job must be stoppable/restartable since is intended to run on a Time window.

What I planning to do:

To achieve some performance on reading and writing I am avoiding use of Spring's out-of-the-box reflection beans and using jdbcBatchWriter to write the processed lines to the database.

The FileReader reads the lines with a custom FieldSetMapper, transform all the columns with FieldSet.readString method (this implies no ParseException on Reading). A Bean injected on the Processor performs parsing and validation, so this way we can avoid skipping exceptions during reading which seems an expensive operation and can count the invalid lines to pass through future steps, saving the info on the step/job execution context.

The processor bean should convert the object read an return a Wrapper with the original object, the parsed values (i.e., Dates and Longs), the first eventual Exception thrown by the Parsing and a boolean that indicates whether the validation was successful or not. After the parsing another CustomProcessor check if the register should be inserted on the database by querying similar or identical registers already inserted. This business rule could imply in a query into the database per valid line in the worst scenario.

A jdbcItemWriter discards null values returned by the processors and writes valid registers to the database.

So The real Questions regarding batch processing:

What are some performance tips that a I could use to improve the batch performance? In a preliminary attempt the load of a perfect valid mock input file into the database led to 15 hours of processing without querying the database to verify if the processed register should be inserted. What could be the local processing simplest solution?

Upvotes: 3

Views: 8002

Answers (1)

Related Questions