Reputation: 1116
Scenario:
SIGPIPE
is blocked to prevent erroring out when the subprocess dies (sigtimedwait
is used to clear the signal)-9
(it's a dedicated test)stress -c 96
). A race condition, yay!It looks like the difference between the successful and the hung runs is this:
Successful runs: write
attempt fails with errno
=EPIPE
, which is then handled
Failed runs: the first 64K are written successfully (presumably filling the buffer), next write
returns -1
with errno
=EAGAIN
, then epoll_wait
is called with EPOLLOUT
without a timeout (to wait for the buffer to clear under normal circumstances) - AND it never returns
I tried:
EPOLLHUP
and EPOLLERR
that should trigger regardlessEPOLLRDHUP
which was promising, but achieved nothingepoll_wait
, but it's always shown as alive (tried both kill(pid, 0)
and waitpid(pid, &status, WNOHANG)
). Plus, it would still be racy anywaysigtimedwait
's output - it returns -1
with errno
=EAGAIN
(sadly no EPIPE
as I could handle that)Note 1: This seems to be an epoll
's quirk, and if so - I'd appreciate a recommended workaround
Note 2: I have a solution that works around the hang, but I'm not sure it's a great one - so will hold off sharing it to not skew the answers
Upvotes: 1
Views: 53