Batch processing in jdbc gateway

Question

my setup (simplified for clarity) is following:


    
    
    
    



    
    
    



    
        
            select * from large_dataset
        
    
    
        
    



    
        
            select * from another_large_dataset

let me explain the steps a little bit:

poller is triggered by cron. message is enriched with some information about this run.
message is sent to multiple data-source chains.
each chain extracts data from large dataset (100+k rows). resultset message is marked with source header.
resultset is split into smaller chunks, transformed and inserted into db2.
after all data sources have been polled, some complex processing is initiated, using the information about the run.

this configuration does the job so far, but is not scalable. main problem is that i have to load full dataset into memory first and pass it along the pipeline, which might cause memory issues.

my question is - what is the simplest way to have resultset extracted from db1, pushed through the pipeline and inserted into db2 in small batches?

Artem Bilan · Accepted Answer

First of all since version 4.0.4 Spring Integration's supports Iterator as payload to avoid memory overhead.

We have a test-case for the JDBC which shows that behaviour. But as you see it is based on the Spring Integration Java DSL and Java 8 Lamdas. (Yes, it can be done even for older Java versions without Lamdas). Even if this case is appropriate for you, your should not be in-memory, because it collects all messages to the MessageStore.

That's first case.

Another option is based on the paging algorithm, when your SELECT accepts a pair of WHERE params in the your DB dialect. For Oracle it can be like: Paging with Oracle. Where the pageNumber is some message header - :headers[pageNumber]

After that you do some trick with to send a SELECT result to the save channel and to some other channel wich increments pageNumber header value and sends a message to the data.source.1 channel and so on. When the pageNumber becomes out of data scope, the stops produces results.

Something like that.

I don't say that it so easy, but it should be a start point for you, at least.

Batch processing in jdbc gateway

Answers (1)

Related Questions