Bibek Kr. Bazaz
Bibek Kr. Bazaz

Reputation: 545

Mule 4 : Design : How to process data[files/ database records] in Mule 4 without getting "out-of-memory" error?

Scenario : I have a database which contains 100k records which have a 10 GB size in memory. My objective is to

To achieve this, I am thinking of the design as follows:

{
   "group_1" : [...],
   "group_2" : [...]

}

Questions/Concerns :

Case 1 : When reading from database select it loads all the 100k records in memory. Question : How to optimize this step so that I can still get 100k records to process but not have a spike in memory usage?

Case 2 : When segregating the data I am storing the isolated data in the aggregator object in reduce operator and then the object stays in memory till i write it into files. Question : Is there a way I can segregate the data and directly write the data in files in the batch aggregator step and quickly clean the memory from the aggregator object space?

Please treat it as a design question for Mule 4 flows and help me. Thanking the community for your help ad support.

Upvotes: 0

Views: 961

Answers (1)

aled
aled

Reputation: 25699

  1. Don't load 100K records in memory. Loading high volumes of data in memory will probably cause an out of memory error. You are not providing details in the configurations but the database connector 'streams' pages of records by default so that's taking care. Use the fetchSize attribute to tune the number of records per page that are read. The default is 10. The batch scope uses disk space to buffer data, to avoid using RAM memory. It also has some parameters to help tune the numbers of records processed per step, for example batch block size and batch aggregator size. Using default values would not be anywhere near 100K records. Also be sure to control concurrency to limit resource usage.

Note that even if reducing all configurations it doesn't mean there will be no spike when processing. Any processing consumes resources. The idea is to have a predictable, controlled spike, instead of an uncontrolled one that can exhaust available resources.

  1. This question is not clear. You can't control the aggregator memory other than the aggregator size, but it looks like it only keeps the more recent aggregated records, not all the records. Are you having any problems with that or is this a theoretical question?

Upvotes: 1

Related Questions