How to design a memory-efficient partitioning algorithm for task execution with shared dependencies?
I’m trying to design a partitioning algorithm to scale task execution in a resource-constrained environment. Here’s the situation:
- Tasks consume data via a DAL (Data Access Layer), which could be a database, external API, etc.
- All tasks are currently executed on a single process with a 100MB memory limit, causing OOM issues. They must be executed concurrently.
- The memory issue lies in the intermediate steps performed by the DAL, not the final output size.
- I can create more processes and divide the workers between them. Each process providing another 100MB.
Key characteristics of the system:
- Tasks are organized as a DAG with possible dependencies between them. If task1 depends on task2, then running task1 will implicitly trigger task2.
- Some tasks share the same DAL calls with identical inputs.
- DAL’s don’t have persistent caching but do maintain a local cache for unique inputs.
- I want to minimize redundant DAL calls for shared dependencies.
What I know:
- I have data on the memory consumption of each DAL call at various percentiles.
- For each pair of tasks (e.g., task1, task2), I know which DALs they share, how many times they are executed, and with which inputs (e.g., DAL1 is executed twice with input1 and input2).
- For each f1 I have all the recursive upstream dependencies of it.
What I’ve considered so far:
I thought of creating a graph where:
- Each node represents a task.
- An edge exists between two nodes if:
- The tasks share at least one DAL with the same inputs.
- The tasks are dependent on each other.
The idea is to weight the nodes and edges based on memory consumption and run a penalty- and constraint-based partitioning algorithm. However, I’m struggling to correctly weight the edges and nodes without “double counting” memory consumption for shared DALs.
Once I have the partitions, I can distribute their work across different processes.
Goal:
I need a solution that:
- Eliminates OOM errors.
- Minimizes duplicated DAL calls while respecting memory constraints.
How would you approach this problem? Any suggestions on how to properly weight the graph or alternative strategies for partitioning?
Thanks!