Why running an ElementWise pymoo Problem with a PyTorch model for evaluation works on Windows but not Linux?

Question

I am currently working on a NSGA2 algorithm that is required to be evaluated ELEMENTWISE. The method for evaluating each input is a PyTorch model. I understand pytorch models are extremely efficient for vectorized computations but for this specific problem we must evaluate elementwise.

To speed up the algorithm, I am employing StarmapParallelization as per pymoo with 20 processes. When running on Windows the behavior and results are exactly as expected with no issues. The problem comes when trying to run the same script on a Linux server. With windows, the CPU utilization spikes when evaluating the generation but on Linux after an initial spike it seems like all the cores get stalled or dies.

Looking deeper into which cores are being used by printing out the core id, it seems like the jobs are forced onto just a fraction of the cores at a time.

I am aware that Linux uses ‘fork’ instead of Window’s ‘spawn’, but even after forcing spawn on Linux the result is the same. I have also heard that torch and numpy could be doing something funky with multithreading so I have tried forcing them to use only 1 thread.

Specs:

Linux os: Ubuntu 20.04.6 LTS
Windows os: Windows 11
Linux CPU: AMD EPYC 7H12 64-Core Processor (128 CPUs)
Windows CPU: i7-13850HX (20 cores, 28 logical processors)
torch=2.4.0+cpu, pymoo=0.6.0.1

I am completely at a loss for why this is happening so any help/advice would be greatly appreciated! Thanks!

To demonstrate my issue, I have created a reproducible script where the performance is perfect on Windows but drastically dies on Linux, and photos.

import os

os.environ["OMP_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"

import torch

torch.set_num_threads(1)
torch.set_num_interop_threads(1)

import numpy as np
import torch.multiprocessing as mp
import time
import psutil

from pymoo.algorithms.moo.nsga2 import NSGA2
from pymoo.optimize import minimize
from pymoo.core.problem import StarmapParallelization, ElementwiseProblem


def load_target_model():
    """
    Dummy function to load a PyTorch model.
    Replace this with your actual model loading logic.
    For demonstration, a simple linear model is created that outputs 2 values.
    """
    print('loading')

    # Model now outputs 2 objectives from 16 inputs.
    m = torch.nn.Linear(16, 2)

    with torch.no_grad():
        m.weight.fill_(0.5)
        m.bias.fill_(0.0)
    m.eval()
    return m


def predict(inputs, m):
    out = None
    for _ in range(1000):
        out = m(inputs)
    return out


class TargetProblem(ElementwiseProblem):
    def __init__(self, **kwargs):
        bounds = {
            "1": (1, 5),
            "2": (0.5, 1),
            "3": (1, 10),
            "4": (1, 5),
            "5": (5, 10),
            "6": (0.5, 1),
            "7": (1, 10),
            "8": (0.5, 1),
            "9": (1, 5),
            "10": (1, 10),
            "11": (0.5, 1),
            "12": (1, 5),
            "13": (1, 10),
            "14": (1, 5),
            "15": (1, 10),
            "16": (0, 1)
        }

        xl = np.array([bound[0] for bound in bounds.values()])
        xu = np.array([bound[1] for bound in bounds.values()])

        self.model = load_target_model()

        super().__init__(n_var=16, n_obj=2, n_constr=0, xl=xl, xu=xu, **kwargs)

    def _evaluate(self, x, out, *args, **kwargs):
        # print(psutil.Process().cpu_num())
        x_tensor = torch.tensor(x, dtype=torch.float32)

        with torch.no_grad():
            obj_values = predict(x_tensor, self.model)
            obj_values = obj_values.cpu().numpy()

        out["F"] = obj_values


if __name__ == "__main__":
    mp.set_start_method('spawn')

    start = time.time()

    pool = mp.Pool(20)

    runner = StarmapParallelization(pool.starmap)

    problem = TargetProblem(elementwise_runner=runner)
    algorithm = NSGA2(pop_size=10000)

    termination = ("n_gen", 4)

    res = minimize(problem,
                   algorithm,
                   termination,
                   seed=1,
                   verbose=True)

    pool.close()
    pool.join()

    print(time.time()-start)

Windows results/performance

Results:

loading
==========================================================
n_gen  |  n_eval  | n_nds  |      eps      |   indicator  
==========================================================
     1 |    10000 |      1 |             - |             -
     2 |    20000 |      1 |  0.000000E+00 |             f
     3 |    30000 |      1 |  0.5519638062 |         ideal
     4 |    40000 |      1 |  0.6739788055 |         ideal
64.28410387039185

Process finished with exit code 0

[CPU Utilization](https://i.sstatic.net/2f66GodM.png)

Linux results/performance

Results:

loading
Compiled modules for significant speedup can not be used!
https://pymoo.org/installation.html#installation

To disable this warning:
from pymoo.config import Config
Config.warnings['not_compiled'] = False

==========================================================
n_gen  |  n_eval  | n_nds  |      eps      |   indicator  
==========================================================
     1 |    10000 |      1 |             - |             -
     2 |    20000 |      1 |  0.7407798767 |         ideal
     3 |    30000 |      1 |  1.9684743881 |         ideal
     4 |    40000 |      1 |  0.7951240540 |         ideal
791.8529000282288

Process finished with exit code 0

[Initial CPU Utilization](https://i.sstatic.net/V0v3H4Gt.png)

[CPU Utilization after a generation](https://i.sstatic.net/IYmY3k5W.png)

Why running an ElementWise pymoo Problem with a PyTorch model for evaluation works on Windows but not Linux?

Answers (0)

Related Questions