Reputation: 77
The problem statement in short: Is there a way in LSF to pass a signal SIGCONT/SIGTSTP to all processes running within a job?
I have a Perl wrapper script that runs on LSF (Version 9.1.2) and starts a tool (Source not available) on the same LSF machine as the Perl script.
The tool starts 2 processes, one for license management and another for doing the actual work. It also supports an option where sending SIGSTSP/SIGCONT to both processes will release/reacquire the license (which is what I wish to achieve).
Running bkill -s SIGCONT <JOB_ID>
only resumes the tool process and not the license process, which is a problem.
I tried to see if I can send the signals to the Perl script's own PGID, but the license process starts its own process group.
Any suggestions to move forward through Perl or LSF options are welcome.
Thanks, Abhishek
Upvotes: 3
Views: 280
Reputation: 2282
I tried to see if I can send the signals to the Perl script's own PGID, but the license process starts its own process group.
This is likely your problem right here. LSF keeps track of "processes running within the job" by process group. If your job spawns a process that runs within its own process group (say by daemonizing itself) then it essentially is a runaway process out of LSF's control -- it becomes your job's responsibility to manage it.
For reference, see the section on "Detached processes" here.
As for options:
LSF_PROCESS_TRACKING
and LSF_LINUX_CGROUP_ACCT
are set in lsf.conf. If they aren't, then you can ask him to set them and see if that helps for your case (you need to make sure the host you're running on supports cgroups). In 9.1.2 this feature is turned on at installation time, so this option might not actually help you for various reasons (your hosts don't have cgroups enabled for example).perl
script, you can install custom signal handlers for SIGCONT
/SIGSTP
in your script using sigtrap
or the like and forward them to the license process yourself when your script receives them through bkill
. See here.Upvotes: 3