Bodo Hugo Barwich
Bodo Hugo Barwich

Reputation: 241

Why do Pipes from Child Processes break sometimes and sometimes not?

I am working on a System in Perl with a Master Script that needs to execute several different Worker Scripts that execute different Tasks on the System at the same time.
Those Scripts do not need to be written in Perl as well. They can be any Programm that is executeable on the Command Line and returns the Results over the STDOUT and STDERR and the EXITCODE.

So I made a little Example Script hello.pl:

#!/usr/bin/perl

print STDERR "HELLO ERROR 0 !\n";

print "hello script!!\n";

print "hello waiting 3 sec ...\n";

sleep 3;

print "hello list:\n";
print "cmd: 'ls -lah'\n";

print `ls -lah`;

print "res: '$?'\n";    

print "hello waiting 2 sec ...\n";

sleep 2;    

print STDERR "HELLO ERROR 1 !\n";

The Master Script needs to capture the Result of the STDOUT on one side and the Error Notifications in STDERR on the other side.

The Master Script does a fork to execute the Worker Script in a non-blocking mode and uses waitpid and select to monitor the Worker Script progress.

So the Code of the master.pl Script is this:

#!/usr/bin/perl

use strict;
use warnings;

use IO::Select;
use POSIX ":sys_wait_h";


sub Launch
{
  my $rprocess = shift;
  my $irs = 0;


  local *logreader;
  local *errorreader;
  my $logwriter   = undef;
  my $errorwriter = undef;

  pipe( *logreader,   $logwriter );
  pipe( *errorreader, $errorwriter );

  #Spawn the Child Process
  my $iprcpid = fork();

  #Check the Success of Process Forking
  if(defined $iprcpid)
  {           
    #------------------------
    #Sub Process Launch succeeded

    # Check whether parent/child process
    if($iprcpid > 0)
    {
      #------------------------
      #Parent Process

      close($logwriter);
      close($errorwriter);

      $rprocess->{"pid"} = $iprcpid;
      $rprocess->{"log_pipe"} = *logreader;
      $rprocess->{"error_pipe"} = *errorreader;

      $irs = 1;
    }
    elsif($iprcpid == 0)
    {
      #------------------------
      #Child Process

      my $ierr = 0;

      close(*logreader);
      close(*errorreader);

      open( STDOUT, ">&=", $logwriter );
      open( STDERR, ">&=", $errorwriter );


      #------------------------
      #Execute the configured Command

      print "cmd: '" . $rprocess->{"file"} . "'\n";

      print "cmd rng ...\n";

      print `$rprocess->{"file"}`; 

      $ierr = $?;

      print "closing transmission ...\n";   

      close STDOUT;
      close STDERR;

      exit $ierr; 
    }
    else    #An Error has ocurred in the Sub Process Launch
    {
        # Unable to fork
        print "ERROR: Sub Process '" . $rprocess->{"name"} . "' Launch failed: $!\n";
    }   #if($iprcpid > 0)
  }
  else    #An Error has ocurred in the Process Spawning   
  {
    # Unable to fork
    print "ERROR: Sub Process '" . $rprocess->{"name"} . "' Launch failed: $!\n";
  }  #if(defined $iprcpid)

  return $irs;
}

sub Read
{
  my $rprocess = shift;

  my $ppsel = $rprocess->{"select"};
  my $pplog = $rprocess->{"log_pipe"};
  my $pperr = $rprocess->{"error_pipe"};
  my @arrppselrdy = undef;
  my $pprdy = undef;
  my $srdln = "";
  my $irdcnt = 0;

  unless(defined $ppsel)
  {
    $ppsel = IO::Select->new();

    $ppsel->add($pplog);
    $ppsel->add($pperr);

    #Store the Selector Object
    $rprocess->{"select"} = $ppsel;
  }
  else  #The Selector was already created
  {
    $ppsel = $rprocess->{"select"};
  }  #unless(defined $ppsel)    

  while(@arrppselrdy = $ppsel->can_read(1))
  {
    foreach $pprdy (@arrppselrdy)
    {
      $irdcnt = sysread($pprdy, $srdln, 8192);

      if(defined $irdcnt)
      {
        if($irdcnt > 0)
        {
          if(fileno($pprdy) == fileno($pplog))
          {
            $rprocess->{"log"} .= "pipe (" . fileno($pprdy) . "): reading report ...\n";

            $rprocess->{"log"} .= $srdln;
          }
          elsif(fileno($pprdy) == fileno($pperr))
          {
            $rprocess->{"log"} .= "pipe (" . fileno($pprdy) . "): reading error ...\n";

            $rprocess->{"error"} .= $srdln;
          }   #if(fileno($pprdy) == fileno($pplog))
        }
        else    #End of Transmission
        {
          $rprocess->{"log"} .= "pipe (" . fileno($pprdy) . "): transmission done.\n";

          #Remove the Pipe File Handle
          $ppsel->remove($pprdy);  

        } #if($irdcnt > 0)
      }
      else  #Reading from the Pipe failed
      {
        #Remove the Pipe File Handle
        $ppsel->remove($pprdy);

        if($!)
        {
          $rprocess->{"error"} .= "ERROR: Sub Process " . $rprocess->{"name"} 
            . ": Read failed with [" . ($! * 1) . "]!\n"
            . "Message: '$!'\n";
        }  #if($!)
      }  #if(defined $irdcnt)
    }  #foreach $pprdy (@arrppselrdy)
  } #while(@arrppselrdy = $ppsel->can_read(1))

  $rprocess->{"log"} .= "try read done. '" . $ppsel->count . "' pipes left.\n";
}

sub Check
{
  my $rprocess = shift;
  my $irng = 0;

  if($rprocess->{"pid"} > 0
    && $rprocess->{"status"} < 0)
  {
    my $ifinishpid = waitpid($rprocess->{"pid"}, WNOHANG);

    if($ifinishpid > -1)
    {
      if($ifinishpid == 0)
      {
        $irng = 1;
      }
      else
      {
        $rprocess->{"status"} = $?;
      }

      #Read the Message Pipes
      Read($rprocess);

      if($ifinishpid > 0)
      {
        close $rprocess->{"log_pipe"};
        close $rprocess->{"error_pipe"};
      }
    }
    else  #Sub Process does not exist
    {
      $rprocess->{"error"} .= "ERROR: Sub Process does not exist!\n";
    }  #if($ifinishpid > 0)
  }  #if($rprocess->{"pid"} > 0 && $rprocess->{"status"} < 0)

  return $irng;
}



my %hshprocesses = ("hello1" => {"name" => "hello1", "pid" => -1, "file" => "./hello.pl"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1}
  , "hello2" => {"name" => "hello2", "pid" => -1, "file" => "sleep 3 ; ./hello2.pl"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1}
  , "hello3" => {"name" => "hello3", "pid" => -1, "file" => "./hello3.pl ; sleep 2"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1}
  , "hello4" => {"name" => "hello4", "pid" => -1, "file" => "./hello4.pl"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1});


print "prcs launching ...\n";

foreach (keys %hshprocesses)
{
  unless(Launch($hshprocesses{$_}))
  {
    print "prc '" . $hshprocesses{$_}{"name"} . "': Launch failed!\n";
  }
}  #foreach (keys %hshprocesses)

print "prcs launched.\n";

my $irunningcount = 0;

do
{
  $irunningcount = 0;

  foreach (keys %hshprocesses)
  {
    $irunningcount++ if(Check($hshprocesses{$_}));
  }  #foreach (keys %hshprocesses)

  print "prc rng cnt: '$irunningcount'\n";
}
while($irunningcount > 0);

foreach (keys %hshprocesses)
{
  print "process (" . $hshprocesses{$_}{"pid"} . ") '" . $hshprocesses{$_}{"name"} . "':\n";
  print "status [" . $hshprocesses{$_}{"status"} . "]\n";
  print "log:\n"
    . "'" . $hshprocesses{$_}{"log"} . "'\n";
  print "error:\n"
    . "'" . $hshprocesses{$_}{"error"} . "'\n";
}  #foreach (keys %hshprocesses)

So reading on logreader and errorreader I thought I would be able to capture the whole output.

The expected Report would be like this:

$ ./master.pl
prcs launching ...
prcs launched.
prc rng cnt: '4'
prc rng cnt: '3'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '0'
process (2742) 'hello4':
status [0]
log:
'pipe (3): reading report ...
cmd: './hello4.pl'
cmd rng ...
pipe (5): reading error ...
try read done. '2' pipes left.
pipe (5): reading error ...
pipe (3): reading report ...
hello4 script!!
hello waiting 3 sec ...
hello4 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
pipe (3): reading report ...
closing transmission ...
pipe (5): transmission done.
pipe (3): transmission done.
try read done. '0' pipes left.
try read done. '0' pipes left.
'
error:
'HELLO4 ERROR 0 !
HELLO4 ERROR 1 !
'
process (2743) 'hello1':
status [0]
log:
'pipe (4): reading report ...
cmd: 'sleep 3 ; ./hello.pl'
cmd rng ...
try read done. '2' pipes left.
pipe (7): reading error ...
try read done. '2' pipes left.
try read done. '2' pipes left.
pipe (7): reading error ...
pipe (4): reading report ...
hello script!!
hello waiting 3 sec ...
hello list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (7): transmission done.
pipe (4): transmission done.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
'
error:
'HELLO ERROR 0 !
HELLO ERROR 1 !
'
process (2745) 'hello3':
status [0]
log:
'pipe (6): reading report ...
cmd: './hello3.pl ; sleep 2'
cmd rng ...
pipe (9): reading error ...
try read done. '2' pipes left.
pipe (9): reading error ...
try read done. '2' pipes left.
pipe (6): reading report ...
hello3 script!!
hello waiting 3 sec ...
hello3 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (9): transmission done.
pipe (6): transmission done.
try read done. '0' pipes left.
'
error:
'HELLO3 ERROR 0 !
HELLO3 ERROR 1 !
'
process (2746) 'hello2':
status [0]
log:
'pipe (8): reading report ...
cmd: './hello2.pl'
cmd rng ...
pipe (11): reading error ...
try read done. '2' pipes left.
pipe (8): reading report ...
hello2 script!!
hello waiting 3 sec ...
hello2 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (11): reading error ...
pipe (8): transmission done.
pipe (11): transmission done.
try read done. '0' pipes left.
'
error:
'HELLO2 ERROR 0 !
HELLO2 ERROR 1 !
'

But this works only if there is only 1 Worker Script.
If the Master Script tries to read on several Worker Scripts this Transmission Pipes are closed by the Child right before executing the Worker Script hello.pl . The last thing that is read is:

        cmd: './hello.pl'
        cmd rng ...

The Master Script uses the IO::Select Module to read from logreader and errorreader with a selector Object.
That way I find in the Progress Log:

        cmd: './hello.pl'
        cmd rng ...
        pipe (8): transmission done.

But all the stuff from hello.pl is lost.

Sometimes I still find the first line of the hello.pl STDERR in the Error Report

HELLO ERROR 0 !

Why does the Pipe sometimes break and sometimes not?
How can I prevend this from happening?

Now with this Test Script I could not reproduce the Error of the Broken Pipe.
But even in the Real World Application it sometimes happens and sometimes not.
I developped this Application on a System with perl v5.10 and from 4 Processes 3 run and 1 breaks. Especially if it does a sleep call.

Under which circumstances could a Pipe break?


edit 2018-05-05: I built the Master Script master.pl and run it with perl v5.24.1

Upvotes: 0

Views: 705

Answers (2)

Bodo Hugo Barwich
Bodo Hugo Barwich

Reputation: 241

Since nobody could give me any useful Answer or even Hint in 1 month I will post what I have found out about the Phenomenon of the Broken Pipe in order to help others that might face a similar problem.


Finally I could track down the Phenomenon of the Broken Pipe.
I added long enough sleeping time that would let me track the System Activity with the strace Command and it actually documents the Broken Pipe Error:

$ strace -p 11729
Process 11729 attached
restart_syscall(<... resuming interrupted call ...>) = 0
write(1, "hello script!!\nhello waiting 20 "..., 67) = 67
pipe([3, 4])                            = 0
pipe([5, 6])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe670d5f9d0) = 13238
close(6)                                = 0
close(4)                                = 0
read(5, "", 4)                          = 0
close(5)                                = 0
ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff334b57b0) = -1 EINVAL (Invalid argument)
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(3, "total 228K\ndrwxr-xr-x.  4 usr15 "..., 4096) = 1868
read(3, "", 4096)                       = 0
close(3)                                = 0
rt_sigaction(SIGHUP, {SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0}, {SIG_DFL, [], 0}, 8) = 0
wait4(13238, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13238
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13238, si_status=0, si_utime=0, si_stime=0} ---
rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0}, NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0}, NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({30, 0}, 0x7fff334b5b90)      = 0
write(2, "HELLO ERROR 1 !\n", 16)       = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=11729, si_uid=501} ---
+++ killed by SIGPIPE +++

It documents as supposed that the hello.pl completes correctly and ends writing the Error Message write(2, "HELLO ERROR 1 !\n", 16) = -1
But it finds that the Pipe is broken, causing an Error EPIPE (Broken pipe) which causes it to crash with +++ killed by SIGPIPE +++

So the Answer to the Question
Under which circumstances could a Pipe break? is

  • If the forked Child Process was already terminated.

But why couldn't I read the Output before the Pipe is broken ?

The Phenomenon of the Cut Output

    cmd: './hello.pl'
    cmd rng ...
    pipe (8): transmission done.
  • results from a combination of print() in the forked Child Process and the sysread() in the Parent Process.
    The Output is actually buffered on the Child side and finally written to the Parent side.

Why should the Child Process be terminated before receiving the Output?

  • This comes from a Combination with Objecthood in the actual Real World Application which implements the Object::DESTROY Method.
    This Method includes a kill() Cleanup Logic for still running Child Processes which theoretically should not be triggered.
    But in a Real World Application the Processes are not inmediately read and reaped what causes them to be killed by the Object::DESTROY Logic.
    But I cannot find anthing of this in the Execution Log because it Object was already destroyed by Perl's Garbage Collector.

Therefore some Processes get killed but others not. And sometimes they get killed but other times not.

Upvotes: 2

7 Reeds
7 Reeds

Reputation: 2539

I think you have to use wait and not waitpid. wait waits for any child while waitpid only waits on one specific child.

Upvotes: 0

Related Questions