Reputation: 750
I am working with a spring batch application for the first time and since the framework is way too flexible, I have a few questions on performance and best practices implementing jobs which I couldn't find clear answers in the spring docs.
read an ASCII file with fixed column length values sent by a third-party with previously specified layout (STEP 1 reader)
validate the read values and register (Log file) the errors (custom messages)
Apply some business logic on the processor to filter any undesirable lines (STEP 1 processor)
write the valid lines on oracle database (STEP 1 writer)
After the execution of the previous step update a table on the database with the the step 1 finish timestamp (STEP 2 tasklet)
Send an email when the job is stopped with a summary of the quantities already processed, errors and written lines, start time and finish time (Are These informations on the jobRepository
meta-data?)
To achieve some performance on reading and writing I am avoiding use of Spring's out-of-the-box reflection beans and using jdbcBatchWriter
to write the processed lines to the database.
The FileReader
reads the lines with a custom FieldSetMapper
, transform all the columns with FieldSet.readString
method (this implies no ParseException
on Reading). A Bean injected on the Processor performs parsing and validation, so this way we can avoid skipping exceptions during reading which seems an expensive operation and can count the invalid lines to pass through future steps, saving the info on the step/job execution context.
The processor bean should convert the object read an return a Wrapper with the original object, the parsed values (i.e., Dates and Longs), the first eventual Exception thrown by the Parsing and a boolean that indicates whether the validation was successful or not. After the parsing another CustomProcessor
check if the register should be inserted on the database by querying similar or identical registers already inserted. This business rule could imply in a query into the database per valid line in the worst scenario.
A jdbcItemWriter
discards null values returned by the processors and writes valid registers to the database.
What are some performance tips that a I could use to improve the batch performance? In a preliminary attempt the load of a perfect valid mock input file into the database led to 15 hours of processing without querying the database to verify if the processed register should be inserted. What could be the local processing simplest solution?
Upvotes: 3
Views: 8002
Reputation: 131
Have you seen partitioning ? http://docs.spring.io/spring-batch/reference/html/scalability.html and this may also helpful remote chunking with the control on reader in spring batch
Upvotes: 1