queeg
queeg

Reputation: 9473

How to distribute work correctly in JSR-352?

So I have been using Java Batch Processing for some time now. My jobs were either import/export jobs which chunked from a reader to a writer, or I would write Batchlets that would do some more complex processing. Since I am beginning to hit memory limits I need to rethink the architecture.

So I want to want to better leverage the chunked Reader/Processor/Writer pattern. And apparently I feel unsure how to distribute the work over the three items. During processing it becomes clear whether to write zero, one or several other records.

The reader is quite clear: It reads the data to be processed from the DB. But I am unsure how to write the records back to the database. I see these options:

Which way would be the best for this kind of task?

Upvotes: 0

Views: 158

Answers (2)

cheng
cheng

Reputation: 1138

Typically, an item processor processes an input item passed from an item reader, and the processing result can be null or a domain object. So it's not suited for your cases where the processing result may be split into multiple objects. I would assume even in your case, multiple objects from a processing iteration is not common. So I would suggest to use list or any collection type as the element type of the processed object only when necessary. In other more common cases, the item processor will still return null (to skip the current processed item) or a domain object.

When the item writer iterates through accumulated items, it can check if it's a collection and then write out all contained elements. For domain object type, then just write it as usual.

Using non-jta datasource for the reader is fine. I think you would want to keep the reader connection open from the start to end to keep reading from the result set. In an item writer, the connection is typically acquired at the beginning of the write operation and closed at the end of the chunk transaction commit or rollback.

Some resources that may be of help:

Jakarta Batch API, jberet-support JdbcItemReader, jberet-support JdbcItemWriter

Upvotes: 1

queeg
queeg

Reputation: 9473

Looking at https://www.ibm.com/support/pages/system/files/inline-files/WP102706_WLB_JSR352.002.pdf, especially the chapters Chunk/The Processor and Chunk/The Writer it becomes obvious that it is up to me.

The processor can return an object, and the writer will have to understand and write this object. So for the above case where the processor has zero, one or many items to write per input record, it should simply return a list. This list can contain zero, one or several elements. The writer has to understand the list and write its elements to the database.

Since the logic is divided this way, the code is still pluggable and can easily be extended or maintained.

Addon: Since both reader and writer this time connect to the same database, I perceived the problem that upon commit for each chunk the connection for the reader was also invalidated. The solution was to use a nonJTA datasource for the reader.

Upvotes: 1

Related Questions