PBS: job on two nodes uses memory of only one

Question

I am trying to run a job (python code) on cluster using MPI. There is 63GB of memory available on each node. When I run it on one node, I specify PBS parameters with (only relevant parameters are listed here):

#PBS -l mem=60GB
#PBS -l nodes=node01.cluster:ppn=32
time mpiexec -n 32 python code.py

Than works just fine.

Since PBS man page says mem is memory per entire job, my parameters when trying to run it on two nodes, are

#PBS -l mem=120GB
#PBS -l nodes=node01.cluster:ppn=32+node02.cluster:ppn=32
time mpiexec -n 64 python code.py

This doesn't work (qsub: Job exceeds queue resource limits MSG=cannot satisfy queue max mem requirement). It fails even if I set mem=70GB for example (in case system needs some more memory). If I set mem=60GB when trying to use both nodes, I get

=>> PBS: job killed: mem job total xx kb exceeded limit yy kb.

I tried it with pmem as well (that's pmem=1875MB), but no success.

My question is: How can I use entire 120GB of memory?

PBS: job on two nodes uses memory of only one

Answers (1)

Related Questions