Reputation: 8774
I am looking to read data from multiple tables (different database tables) and aggregate and create final result set. In my case, each query will return the List of object. I went through web many times, I found no link other than - Spring Batch How to read multiple table (queries) as Reader and write it as flat file write, but it returns only single object.
Is there any way if we can do this ? Any working sample example would help a lot.
Example -
Now I want to build Employee and Department relationship and send final object to processor to further lookup against MongoDB and send the final object to reader.
Upvotes: 2
Views: 5081
Reputation: 81
@jeff-cook There are a few things that you can use together to achieve what you want. I have not tried these but you can certainly try and let stackoverflow know if it worked. Though, I came across this very late, but if you still have this requirement, this might help.
Note: Because of the way spring batch works, this will NOT give a list but each individual row would get combined into a relationship. By default, this will get passed to processor where you can get further details per row. When all the items of a chunk have been processed, that's when you would have a list available in the writer. The writer can then write it the way you want.
Just like in point 1. above, you can actually also fetch the details from mongo with its own row mapper. Then combine all three into one relationship in the DefaultUnifyingStringItemsMapper. This can then be written as you want after being processed.
In both approaches above, you will need to create and assign different DataSource objects for each type of DB that you want to read from and assign to respective reader. Note that if you are specifically assigning a transaction manager to the step, the datasource passed to it should be the same instance that you passed to writer else you will see data integrity issues in case of rollbacks. This is how spring batch works, it has nothing to do with above approach.
If you need to validate the three tables data, you can use ValidatingItemProcessor. If a row fails your validation, you can throw validation exception. Now if you want the entire processing to fail when this exception is thrown, you can leave it at that. If you want only the failing item to be skipped and continue processing rest of the items, you will need to set a skip policy by calling faultTolerant().skipPolicy(new AlwaysSkipItemSkipPolicy()) or whichever policy is suitable for you when creating the step.
If you must have a list before you can actually fetch anything from MongoDB, you may try creating a List in jobContext then go on adding items to it after each read and access the list in processor. Then operate on this list as you wish. Though I doubt this will actually work since reader and processor are called one item at a time. The output of processor is then formed into a list and then passed to writer as a chunk. if this does not work for you, you may try reading the mongo in beforeWrite function of ItemWriterListener but that is not the right place to do all this.
I believe your work should be done if you follow first four steps.
Upvotes: 0
Reputation: 31590
The question should rather be "how to join three tables from three different databases and write the result in a file". There is no built-in reader in Spring Batch that reads from multiple tables. You either need to create a custom reader, or decompose the problem at hand into tasks that can be implemented using Spring Batch tasklet/chunk-oriented steps.
I believe you can use the driving query pattern in a single chunk-oriented step. The reader reads employee items, then a processor enrich items with 1) department from postgres and 2) other info from mongo. This should work for small/medium datasets. If you have a lot of data, you can use partitioning to parallelize things and improve performance.
Another option if you want to avoid a query per item is to load all departments in a cache for example (I guess there should be less departments than employees) and enrich items from the cache rather than with individual queries to the db.
Upvotes: 2