watching memory in PBS

Question

I'm running a job on a cluster (using PBS) that runs out of memory. I'm trying to print the memory status for each node separately while my other job is running. I created a shell script and included a call to that script from inside my job submission script. But when I'm submitting my job it gives me permission denied error on the line that calls the script. I don't understand why do I get that error.

Secondly, I was thinking that I can have a 'watch free' or 'watch ps aux' in my script file but now I'm thinking if that will cause my submitted job to get stuck in memory-watching script and never continue to get to the main line that calls my parallel program.

After all, how can I achieve logging my memory in PBS for the jobs I'm submitting. My code is a C++ program using MRMPI (MPI MapReduce) library.

dbeer · Accepted Answer

To see how much memory is being used throughout the job, run qstat -f:

$ qstat -f | grep used
    resources_used.cput = 00:02:51
    resources_used.energy_used = 0
    resources_used.mem = 6960kb
    resources_used.vmem = 56428kb
    resources_used.walltime = 00:01:26

To examine past jobs you can look in the accounting file. This is located in the server_priv/accounting directory, the default is /var/spool/torque/server_priv/accounting/.

The entries look like this:

09/14/2015 10:52:11;E;202.napali;user=dbeer group=company jobname=intense.sh queue=batch ctime=1442248534 qtime=1442248534 etime=1442248534 start=1442248536 owner=dbeer@napali exec_host=napali/0-2 Resource_List.neednodes=1:ppn=3 Resource_List.nodect=1 Resource_List.nodes=1:ppn=3 session=20415 total_execution_slots=3 unique_node_count=1 end=0 Exit_status=0 resources_used.cput=1989 resources_used.energy_used=0 resources_used.mem=9660kb resources_used.vmem=58500kb resources_used.walltime=995

watching memory in PBS

Answers (2)

Related Questions