Mule 4 : Design : How to process data[files/ database records] in Mule 4 without getting "out-of-memory" error?

Question

Scenario : I have a database which contains 100k records which have a 10 GB size in memory. My objective is to

fetch these records,
segregate the data based on certain conditions
then generate csv files for each group of data
write these CSV files to a NAS (storage drive accessible over the same network)

To achieve this, I am thinking of the design as follows:

Use a Scheduler component that triggers the flow daily at 9 am for example)
Use a database select operation to fetch the records
Use a batch processing scope
In batch step use reduce function in Transform message and segregate the data in aggregator in the format like :

{
   "group_1" : [...],
   "group_2" : [...]

}

In the on complete step of batch processing use a file component to write the data in files in the NAS drive

Questions/Concerns :

Case 1 : When reading from database select it loads all the 100k records in memory. Question : How to optimize this step so that I can still get 100k records to process but not have a spike in memory usage?

Case 2 : When segregating the data I am storing the isolated data in the aggregator object in reduce operator and then the object stays in memory till i write it into files. Question : Is there a way I can segregate the data and directly write the data in files in the batch aggregator step and quickly clean the memory from the aggregator object space?

Please treat it as a design question for Mule 4 flows and help me. Thanking the community for your help ad support.

aled · Accepted Answer

Don't load 100K records in memory. Loading high volumes of data in memory will probably cause an out of memory error. You are not providing details in the configurations but the database connector 'streams' pages of records by default so that's taking care. Use the fetchSize attribute to tune the number of records per page that are read. The default is 10. The batch scope uses disk space to buffer data, to avoid using RAM memory. It also has some parameters to help tune the numbers of records processed per step, for example batch block size and batch aggregator size. Using default values would not be anywhere near 100K records. Also be sure to control concurrency to limit resource usage.

Note that even if reducing all configurations it doesn't mean there will be no spike when processing. Any processing consumes resources. The idea is to have a predictable, controlled spike, instead of an uncontrolled one that can exhaust available resources.

This question is not clear. You can't control the aggregator memory other than the aggregator size, but it looks like it only keeps the more recent aggregated records, not all the records. Are you having any problems with that or is this a theoretical question?

Mule 4 : Design : How to process data[files/ database records] in Mule 4 without getting "out-of-memory" error?

Answers (1)

Related Questions

Mule 4 : Design : How to process data[files/ database records] in Mule 4 without getting &quot;out-of-memory&quot; error?

Answers (1)

Related Questions

Mule 4 : Design : How to process data[files/ database records] in Mule 4 without getting "out-of-memory" error?