alper
alper

Reputation: 3410

slurm: How can I prevent job's information to be removed?

Using sacct I want to obtain information about my completed jobs.

Answer mentions how could we obtain a job's information.

I have submitted a job name jobName.sh which has jobID 176. After 12 hours and new 200 jobs came in, I want to check my job's (jobID=176) information and I obtain slurm_load_jobs error: Invalid job id specified.

scontrol show job 176
slurm_load_jobs error: Invalid job id specified

And following line returns nothing: sacct --name jobName.sh

I assume there is a time-limit to keep previously submitted job's information that somehow previous jobs' information has been removed. Is there a limit for that? How could I make that limit very large value in order to prevent them to be deleted?

Please not that JobRequeue=0 is at slurm.conf.

Upvotes: 1

Views: 4352

Answers (2)

alper
alper

Reputation: 3410

On Slurm documentation mentioned that:

MinJobAge The minimum age of a completed job before its record is purged from Slurm's active database. Set the values of MaxJobCount and to ensure the slurmctld daemon does not exhaust its memory or other resources. The default value is 300 seconds. A value of zero prevents any job record purging. In order to eliminate some possible race conditions, the minimum non-zero value for MinJobAge recommended is 2.

On my slurm.conf file, MinJobAge was 300 which is 5 minutes. That's why after 5 minutes each completed job's information was removed. I increased MinJobAge's value in order to prevent the delete operation.

Upvotes: 2

Bub Espinja
Bub Espinja

Reputation: 4571

Assuming that you are using mySQL to store that data, in your database configuration file slurmdbd.conf, you can tune, among others, the purging time. Here you have some examples:

PurgeJobAfter=12hours
PurgeJobAfter=1month
PurgeJobAfter=24months

If not set (default), then job records are never purged.

More info.

Upvotes: 3

Related Questions