mcsim
mcsim

Reputation: 1798

Cgroup unexpectedly propagates SIGSTOP to the parent

I have a small script to run a command inside a cgroup that limits CPU time:

$ cat cgrun.sh
#!/bin/bash

if [[ $# -lt 1 ]]; then
    echo "Usage: $0 <bin>"
    exit 1
fi

sudo cgcreate -g cpu:/cpulimit
sudo cgset -r cpu.cfs_period_us=1000000 cpulimit
sudo cgset -r cpu.cfs_quota_us=100000 cpulimit
sudo cgexec -g cpu:cpulimit sudo -u $USER "$@"
sudo cgdelete cpu:/cpulimit

I let the command run: ./cgrun.sh /bin/sleep 10

Then I send SIGSTOP to the sleep command from another terminal. Somehow at this moment the parent commands, sudo and cgexec receive this signal as well. Then, I send SIGCONT to the sleep command, which allows sleep to continue.

But at this moment sudo and cgexec are stopped and never reap the zombie of the sleep process. I don't understand how this can happen? And how can I prevent it? Moreover, I cannot send SIGCONT to sudo and cgexec, because I'm sending the signals from user, while these commands run as root.

Here is how it looks in htop (some columns omitted):

    PID USER S CPU% MEM%   TIME+  Command
1222869 user S  0.0  0.0  0:00.00 │     │  └─ /bin/bash ./cgrun.sh /bin/sleep 10
1222882 root T  0.0  0.0  0:00.00 │     │     └─ sudo cgexec -g cpu:cpulimit sudo -u user /bin/sleep 10
1222884 root T  0.0  0.0  0:00.00 │     │        └─ sudo -u desertfox /bin/sleep 10
1222887 user Z  0.0  0.0  0:00.00 │     │           └─ /bin/sleep 10

How can create a cgroup in a way that SIGSTOP is not bounced to parent processes?

UPD

If I start the process using systemd-run, I do not observe the same behavior:

sudo systemd-run --uid=$USER -t -p CPUQuota=10% sleep 10

Upvotes: 2

Views: 220

Answers (1)

RKou
RKou

Reputation: 5221

Instead of using the "cg tools", I would do it the "hard way" with the shell commands to create the cpulimit cgroup (it is a mkdir), set the cfs parameters (with echo command in the corresponding cpu.cfs_* files), create a sub-shell with the (...) notation, move it into the cgroup (echo command of its pid into the tasks file of the cgroup) and execute the requested command in this subshell.

Hence, cgrun.sh would look like this:

#!/bin/bash

if [[ $# -lt 1 ]]; then
    echo "Usage: $0 <bin>" >&2
    exit 1
fi

CGTREE=/sys/fs/cgroup/cpu

sudo -s <<EOF
[ ! -d ${CGTREE}/cpulimit ] && mkdir ${CGTREE}/cpulimit
echo 1000000 > ${CGTREE}/cpulimit/cpu.cfs_period_us
echo 100000 > ${CGTREE}/cpulimit/cpu.cfs_quota_us
EOF

# Sub-shell in background
(
  # Pid of the current sub-shell
  # ($$ would return the pid of the father process)
  MY_PID=$BASHPID

  # Move current process into the cgroup
  sudo sh -c "echo ${MY_PID} > ${CGTREE}/cpulimit/tasks"

  # Run the command with calling user id (it inherits the cgroup)
  exec "$@"

) &

# Wait for the sub-shell
wait $!

# Exit code of the sub-shell
rc=$?

# Delete the cgroup
sudo rmdir ${CGTREE}/cpulimit

# Exit with the return code of the sub-shell
exit $rc

Run it (before we get the pid of the current shell to display the process hierarchy in another terminal):

$ echo $$
112588
$ ./cgrun.sh /bin/sleep 50

This creates the following process hierarchy:

$ pstree -p 112588
bash(112588)-+-cgrun.sh(113079)---sleep(113086)

Stop the sleep process:

$ kill -STOP 113086

Look at the cgroup to verify that sleep command is running into it (its pid is in the tasks file) and the CFS parameters are correctly set:

$ ls -l /sys/fs/cgroup/cpu/cpulimit/
total 0
-rw-r--r-- 1 root root 0 nov.    5 22:38 cgroup.clone_children
-rw-r--r-- 1 root root 0 nov.    5 22:38 cgroup.procs
-rw-r--r-- 1 root root 0 nov.    5 22:36 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 nov.    5 22:36 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 nov.    5 22:38 cpu.shares
-r--r--r-- 1 root root 0 nov.    5 22:38 cpu.stat
-rw-r--r-- 1 root root 0 nov.    5 22:38 cpu.uclamp.max
-rw-r--r-- 1 root root 0 nov.    5 22:38 cpu.uclamp.min
-r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.stat
-rw-r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage
-r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage_all
-r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage_percpu
-r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage_percpu_sys
-r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage_percpu_user
-r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage_sys
-r--r--r-- 1 root root 0 nov.    5 22:38 cpuacct.usage_user
-rw-r--r-- 1 root root 0 nov.    5 22:38 notify_on_release
-rw-r--r-- 1 root root 0 nov.    5 22:36 tasks
$ cat /sys/fs/cgroup/cpu/cpulimit/tasks 
113086  # This is the pid of sleep
$ cat /sys/fs/cgroup/cpu/cpulimit/cpu.cfs_*
1000000
100000

Send SIGCONT signal to the sleep process:

$ kill -CONT 113086

The process finishes and the cgroup is destroyed:

$ ls -l /sys/fs/cgroup/cpu/cpulimit
ls: cannot access '/sys/fs/cgroup/cpu/cpulimit': No such file or directory

Get the exit code of the script once it is finished (it is the exit code of the launched command):

$ echo $?
0

Upvotes: 3

Related Questions