Reputation: 1
I am currently working on a NSGA2 algorithm that is required to be evaluated ELEMENTWISE. The method for evaluating each input is a PyTorch model. I understand pytorch models are extremely efficient for vectorized computations but for this specific problem we must evaluate elementwise.
To speed up the algorithm, I am employing StarmapParallelization as per pymoo with 20 processes. When running on Windows the behavior and results are exactly as expected with no issues. The problem comes when trying to run the same script on a Linux server. With windows, the CPU utilization spikes when evaluating the generation but on Linux after an initial spike it seems like all the cores get stalled or dies.
Looking deeper into which cores are being used by printing out the core id, it seems like the jobs are forced onto just a fraction of the cores at a time.
I am aware that Linux uses ‘fork’ instead of Window’s ‘spawn’, but even after forcing spawn on Linux the result is the same. I have also heard that torch and numpy could be doing something funky with multithreading so I have tried forcing them to use only 1 thread.
Specs:
Linux os: Ubuntu 20.04.6 LTS
Windows os: Windows 11
Linux CPU: AMD EPYC 7H12 64-Core Processor (128 CPUs)
Windows CPU: i7-13850HX (20 cores, 28 logical processors)
torch=2.4.0+cpu, pymoo=0.6.0.1
I am completely at a loss for why this is happening so any help/advice would be greatly appreciated! Thanks!
To demonstrate my issue, I have created a reproducible script where the performance is perfect on Windows but drastically dies on Linux, and photos.
import os
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
import torch
torch.set_num_threads(1)
torch.set_num_interop_threads(1)
import numpy as np
import torch.multiprocessing as mp
import time
import psutil
from pymoo.algorithms.moo.nsga2 import NSGA2
from pymoo.optimize import minimize
from pymoo.core.problem import StarmapParallelization, ElementwiseProblem
def load_target_model():
"""
Dummy function to load a PyTorch model.
Replace this with your actual model loading logic.
For demonstration, a simple linear model is created that outputs 2 values.
"""
print('loading')
# Model now outputs 2 objectives from 16 inputs.
m = torch.nn.Linear(16, 2)
with torch.no_grad():
m.weight.fill_(0.5)
m.bias.fill_(0.0)
m.eval()
return m
def predict(inputs, m):
out = None
for _ in range(1000):
out = m(inputs)
return out
class TargetProblem(ElementwiseProblem):
def __init__(self, **kwargs):
bounds = {
"1": (1, 5),
"2": (0.5, 1),
"3": (1, 10),
"4": (1, 5),
"5": (5, 10),
"6": (0.5, 1),
"7": (1, 10),
"8": (0.5, 1),
"9": (1, 5),
"10": (1, 10),
"11": (0.5, 1),
"12": (1, 5),
"13": (1, 10),
"14": (1, 5),
"15": (1, 10),
"16": (0, 1)
}
xl = np.array([bound[0] for bound in bounds.values()])
xu = np.array([bound[1] for bound in bounds.values()])
self.model = load_target_model()
super().__init__(n_var=16, n_obj=2, n_constr=0, xl=xl, xu=xu, **kwargs)
def _evaluate(self, x, out, *args, **kwargs):
# print(psutil.Process().cpu_num())
x_tensor = torch.tensor(x, dtype=torch.float32)
with torch.no_grad():
obj_values = predict(x_tensor, self.model)
obj_values = obj_values.cpu().numpy()
out["F"] = obj_values
if __name__ == "__main__":
mp.set_start_method('spawn')
start = time.time()
pool = mp.Pool(20)
runner = StarmapParallelization(pool.starmap)
problem = TargetProblem(elementwise_runner=runner)
algorithm = NSGA2(pop_size=10000)
termination = ("n_gen", 4)
res = minimize(problem,
algorithm,
termination,
seed=1,
verbose=True)
pool.close()
pool.join()
print(time.time()-start)
Windows results/performance
Results:
loading
==========================================================
n_gen | n_eval | n_nds | eps | indicator
==========================================================
1 | 10000 | 1 | - | -
2 | 20000 | 1 | 0.000000E+00 | f
3 | 30000 | 1 | 0.5519638062 | ideal
4 | 40000 | 1 | 0.6739788055 | ideal
64.28410387039185
Process finished with exit code 0
[CPU Utilization](https://i.sstatic.net/2f66GodM.png)
Linux results/performance
Results:
loading
Compiled modules for significant speedup can not be used!
https://pymoo.org/installation.html#installation
To disable this warning:
from pymoo.config import Config
Config.warnings['not_compiled'] = False
==========================================================
n_gen | n_eval | n_nds | eps | indicator
==========================================================
1 | 10000 | 1 | - | -
2 | 20000 | 1 | 0.7407798767 | ideal
3 | 30000 | 1 | 1.9684743881 | ideal
4 | 40000 | 1 | 0.7951240540 | ideal
791.8529000282288
Process finished with exit code 0
[Initial CPU Utilization](https://i.sstatic.net/V0v3H4Gt.png)
[CPU Utilization after a generation](https://i.sstatic.net/IYmY3k5W.png)
Upvotes: -1
Views: 24