Floegipoky
Floegipoky

Reputation: 3273

Why won't Go kill a child process correctly?

The following works just fine when cmd finishes in the allotted time. However, the timeout is not working. While it does print "It's dead Jim", not only does it fail to print "Done waiting", but the process is not actually killed. It continues to run, and "Done waiting" never prints.

func() {
    var output bytes.Buffer
    cmd := exec.Command("Command", args...)
    cmd.Dir = filepath.Dir(srcFile)
    cmd.Stdout, cmd.Stderr = &output, &output
    if err := cmd.Start(); err != nil {
        return err
    }
    defer time.AfterFunc(time.Second*2, func() {
        fmt.Printf("Nobody got time fo that\n")
        if err := cmd.Process.Signal(syscall.SIGKILL); err != nil {
            fmt.Printf("Error:%s\n", err)
        }
        fmt.Printf("It's dead Jim\n")
    }).Stop()
    err := cmd.Wait()
    fmt.Printf("Done waiting\n")
}()

I don't think it should make a difference, but for what it's worth the command is go test html. The reason it's timing out is because I'm injecting an error that causes an infinite loop before running it. To add to the confusion, I tried running it with go test net. There was a timeout, and it worked correctly.

Upvotes: 17

Views: 20981

Answers (6)

Fritz Lin
Fritz Lin

Reputation: 371

I finally solve my problem.

ref: Killing a child process and all of its children in Go https://medium.com/@felixge/killing-a-child-process-and-all-of-its-children-in-go-54079af94773

ctx, cancel := context.WithCancel(context.Background())
defer cancel()
sh := "du -h -d 1 ~ 2>/dev/null"
cmd := util.NewCtxGpidCommand(ctx, "bash", "-c", sh)
package util

import (
    "context"
    "fmt"
    "os/exec"
    "syscall"
)

func NewCtxPgidCommand(ctx context.Context, name string, args ...string) *exec.Cmd {
    cmd := exec.CommandContext(ctx, name, args...)

    // Killing a child process and all of its children in Go
    // https://medium.com/@felixge/killing-a-child-process-and-all-of-its-children-in-go-54079af94773
    cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}

    // Mutate and override cmd.Cancel
    cmd.Cancel = func() error {
        var errors []error
        if err := cmd.Process.Kill(); err != nil {
            errors = append(errors, err)
        }
        if err := syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL); err != nil {
            errors = append(errors, err)
        }
        if len(errors) > 0 {
            return fmt.Errorf("error cancelling pid=%d, errors=%v", cmd.Process.Pid, errors)
        }
        return nil
    }
    return cmd
}

Upvotes: 0

c2knaps
c2knaps

Reputation: 1837

I'm not sure when it was added, but as of Go 1.11 you can set the Pdeathsig on a subprocess to syscall.SIGKILL. This will kill the child when the parent exits.

cmd, _ := exec.Command("long-running command")
cmd.SysProcAttr = &syscall.SysProcAttr{
    Pdeathsig: syscall.SIGKILL,
}
cmd.Start()

os.Exit(1)

The cmd should be killed on exit.

Upvotes: 13

Rots
Rots

Reputation: 787

Just for reference, I'll put my Windows solution here as well:

func kill(cmd *exec.Cmd) error {
    kill := exec.Command("TASKKILL", "/T", "/F", "/PID", strconv.Itoa(cmd.Process.Pid))
    kill.Stderr = os.Stderr
    kill.Stdout = os.Stdout
    return kill.Run()
 }

Upvotes: 12

selden
selden

Reputation: 333

Your calling process can create a new session on posix systems with setsid. When you execute the following your code becomes the process group leader if (it isn't already that is). When you kill the process group leader the children die too. At least, that is my experience.

cmd.SysProcAttr = &syscall.SysProcAttr{Setsid: true}
cmd.Start()
time.Sleep(5)
if err := syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL); err != nil {
        log.Println("failed to kill: ", err)
}

Upvotes: 5

Matt
Matt

Reputation: 836

Looks like the problem is that cmd.Process.Kill() doesn't kill child processes. See this similar question Process.Kill() on child processes

I found a solution in this thread https://groups.google.com/forum/#!topic/golang-nuts/XoQ3RhFBJl8

cmd := exec.Command( some_command )
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
cmd.Start()

pgid, err := syscall.Getpgid(cmd.Process.Pid)
if err == nil {
    syscall.Kill(-pgid, 15)  // note the minus sign
}

cmd.Wait()

As a caveat this will almost certainly not work across platforms - I'm on OSX Yosemite at the moment, and I'd be willing to bet it'd work on most Linuxes as well, but I don't know enough about BSD to have an opinion and I doubt it would work on Windows.

Upvotes: 24

jeffruan
jeffruan

Reputation: 294

Go's defer statement schedules a function call (the deferred function) to be run immediately before the function executing the defer returns.

So the things after defer

defer time.AfterFunc(time.Second*2, func() {
    fmt.Printf("Nobody got time fo that\n")
    cmd.Process.Kill()
    fmt.Printf("It's dead Jim\n")
}).Stop()

wouldn't be executed unless func() ends. Therefore, if "cmd.Wait()" never end, the "time.AfterFunc()" is never executed.

Removing "time.AfterFunc(...)" from defer can fix this problem, since "time.AfterFunc" could waits for the duration to elapse and then calls f in its own goroutine.

Here is a working version. I tested in my ubuntu box and it works. Save source as wait.go

package main

import "os/exec"
import "time"
import "bytes"
import "fmt"


func main() {
    var output bytes.Buffer
        cmd := exec.Command("sleep", "10s")
        cmd.Stdout, cmd.Stderr = &output, &output
        if err := cmd.Start(); err != nil {
                fmt.Printf("command start error\n")
                return
        }
        time.AfterFunc(time.Second*2, func() {
                fmt.Printf("Nobody got time for that\n")
                cmd.Process.Kill()
                fmt.Printf("It's dead Jim\n")
        })
        cmd.Wait()
        fmt.Printf("Done waiting\n")
}

Run the command:

time go run wait.go

Output:

Nobody got time for that
It's dead Jim
Done waiting

real    0m2.481s
user    0m0.252s
sys 0m0.452s

As @James Henstridge has commented that the above understanding is incorrect. Actually I had incomplete understanding of defer. The other half is "The arguments to the deferred function (which include the receiver if the function is a method) are evaluated when the defer executes". So the timer is truly created when defer is executed and thus timer will time out.

The problem is really why the process cannot be killed. I checked the go's pkg's code, it sends a SIGKILL in *nix like system to kill the process. The SIGKILL cannot be blocked and ignored. So it could be other possibilites such as the process itself is in TASK_UNINTERRUPTIBLE state.

Upvotes: -3

Related Questions