Jeff Cook
Jeff Cook

Reputation: 8774

How to read multiple tables using Spring Batch

I am looking to read data from multiple tables (different database tables) and aggregate and create final result set. In my case, each query will return the List of object. I went through web many times, I found no link other than - Spring Batch How to read multiple table (queries) as Reader and write it as flat file write, but it returns only single object.

Is there any way if we can do this ? Any working sample example would help a lot.

Example -

Now I want to build Employee and Department relationship and send final object to processor to further lookup against MongoDB and send the final object to reader.

Upvotes: 2

Views: 5081

Answers (2)

Atul Lohiya
Atul Lohiya

Reputation: 81

@jeff-cook There are a few things that you can use together to achieve what you want. I have not tried these but you can certainly try and let stackoverflow know if it worked. Though, I came across this very late, but if you still have this requirement, this might help.

  1. To read multiple tables, you can use CompositeItemReader. You can add, one reader per table into the composite reader with each reader having its own query. Each of these reader would have a row mapper that would return specific model like Employee and Department. Then in the DefaultUnifyingStringItemsMapper of the composite reader, you can combine the above two records into one using a parent model say EmployeeDepartment and save above two records into that. At this point you have created the Employee and Department relationship as you want.

Note: Because of the way spring batch works, this will NOT give a list but each individual row would get combined into a relationship. By default, this will get passed to processor where you can get further details per row. When all the items of a chunk have been processed, that's when you would have a list available in the writer. The writer can then write it the way you want.

  1. Just like in point 1. above, you can actually also fetch the details from mongo with its own row mapper. Then combine all three into one relationship in the DefaultUnifyingStringItemsMapper. This can then be written as you want after being processed.

  2. In both approaches above, you will need to create and assign different DataSource objects for each type of DB that you want to read from and assign to respective reader. Note that if you are specifically assigning a transaction manager to the step, the datasource passed to it should be the same instance that you passed to writer else you will see data integrity issues in case of rollbacks. This is how spring batch works, it has nothing to do with above approach.

  3. If you need to validate the three tables data, you can use ValidatingItemProcessor. If a row fails your validation, you can throw validation exception. Now if you want the entire processing to fail when this exception is thrown, you can leave it at that. If you want only the failing item to be skipped and continue processing rest of the items, you will need to set a skip policy by calling faultTolerant().skipPolicy(new AlwaysSkipItemSkipPolicy()) or whichever policy is suitable for you when creating the step.

  4. If you must have a list before you can actually fetch anything from MongoDB, you may try creating a List in jobContext then go on adding items to it after each read and access the list in processor. Then operate on this list as you wish. Though I doubt this will actually work since reader and processor are called one item at a time. The output of processor is then formed into a list and then passed to writer as a chunk. if this does not work for you, you may try reading the mongo in beforeWrite function of ItemWriterListener but that is not the right place to do all this.

I believe your work should be done if you follow first four steps.

Upvotes: 0

Mahmoud Ben Hassine
Mahmoud Ben Hassine

Reputation: 31590

The question should rather be "how to join three tables from three different databases and write the result in a file". There is no built-in reader in Spring Batch that reads from multiple tables. You either need to create a custom reader, or decompose the problem at hand into tasks that can be implemented using Spring Batch tasklet/chunk-oriented steps.

I believe you can use the driving query pattern in a single chunk-oriented step. The reader reads employee items, then a processor enrich items with 1) department from postgres and 2) other info from mongo. This should work for small/medium datasets. If you have a lot of data, you can use partitioning to parallelize things and improve performance.

Another option if you want to avoid a query per item is to load all departments in a cache for example (I guess there should be less departments than employees) and enrich items from the cache rather than with individual queries to the db.

Upvotes: 2

Related Questions