akraf
akraf

Reputation: 3255

Read from a named pipe in two parts in python: First python, then subprocess

I want a python program that acts as a wrapper around bash tools.

For this purpose, I need to read the first line of header files into python to then generate the bash commands from the information found in all subsequent lines. No lines need to be read twice. See MWE 1 below.

For this purpose, I need to open the input files twice: Once to read the first line, and the second time by the bash tools invoked via subprocess.

This works if the input files are regular files, but if they are named pipes or /dev/fd/N-like files used by bashs process substitution, the python program only processes the header lines and then hangs, waiting for further pipe input.

The reason, I suspect, is that the python process sends SIGPIPE to the writing end of the pipe after reading the header line. This terminates the writing and and the bash tools subprocesses have a pipe without writing end. I tried trap "" PIPE, but to no avail. Writer process still terminates (MWE 2).

The question is: How can I read one line from a Linux named pipe with one process, then keep it open for another process?


MWE 1: Sample python program

# --- MWE1.py -------------
import subprocess as sp
import sys

a = sys.argv[1]
b = sys.argv[2]

fd_a = open(a,"rt")
fd_b = open(b,"rt")

header = "\t".join([fd_a.readline().rstrip(), 
                    fd_b.readline().rstrip()])
print("H: "+header)

cmd = "paste {} {}".format(a,b)

sp.check_call(["/bin/bash","-c",cmd], close_fds=False)

RUN by:

mkfifo myfifo
cat > file1 << EOF
a   b
1   2
3   4
5   6
EOF
cat > file2 << EOF
Y   Z
10  11
12  13
14  15
EOF
cat file1 > myfifo &
./MWE1.py myfifo file2
# Prints "H: A   B   Y   Z" and waits for pipe input forever...

MWE 2: Demonstrating problem in bash

trap "" PIPE  # I thought this would prevent exiting
cat file1 > myfifo & strace -p $! -e write,signal

## In another shell...
head -n1 myfifo # terminates "cat file1 > myfifo"
cat myfifo # waits forever for pipe input...

Upvotes: 0

Views: 1114

Answers (3)

akraf
akraf

Reputation: 3255

I've got it!! :-) The solution is to open a file descriptor using os.open. That is simply a number which denotes an open file connected to the program. It is not to be confused with the file objects created by the builitin open function.

Steps:

  1. Open by os.open
  2. Make inheritable by child processes using set_inheritable(fd, bool)
  3. Pass file descriptors to child process using the subprocess.Popen parameters close_fds=False or pass_fds=[...]
  4. (edit) Make sure to use unbuffered I/O in Python to prevent munging up lines of text in some Python buffer which is then unavailable to the subprocess
  5. Profit!

Code: (not tested, I admit, but it worked like this in my program)

import os
import sys
import subprocess as sp

# Open -> gives file descriptor (integer, corresponds to /dev/fd/*)
fd = os.open(FILENAME,os.O_RDONLY) # see os module doc for O_RDONLY
# This gives a standard file object:
handle = os.fdopen(fd,'rb',0) 

# Read one line, do whatever you want in python.
# Read in using unbuffered I/O, this requires reading in binary mode.
# The resulting bytes array must be converted to string using an encoding
header = handle.readline.decode(sys.getdefaultencoding())
header = header.rstrip().split() 

# Make fd inheritable by children processes
os.set_inheritable(fd, True) 

# Open child process, pass file descriptor as input.
# Using /dev/fds/{fd} or <(cat <&{fd}) (bash only) can turn a file descriptor stream
# into a file, if needed for child program.
# This child process sees any input which is not already consumed by the
# above lines
sp.check_call("wc -l <&{fd}".format(fd=fd),pass_fds=(fd,))

Upvotes: 0

Daniel
Daniel

Reputation: 42778

In your case, I wouldn't use an external program. Just do it with python:

from itertools import zip_longest
with open(sys.argv[1], 'rt') as a:
    with open(sys.argv[2], 'rt') as b:
        header = 'H: %s\t%s' % (next(a).rstrip(), next(b).rstrip())
        print(header)
        for m,n in zip_longest(a,b, fillvalue=''):
            print('%s\t%s' % (m.rstrip(), n.rstrip()))

Upvotes: 1

cadrian
cadrian

Reputation: 7376

Contrarily to regular files, pipes are just as advertised: data pipes. When the bytes are read, they are removed.

So if a and b are opened to the same named pipe, and if you want to read the same data again, you need the writer to send it again; otherwise the process hangs for want of some data to read.

Upvotes: 1

Related Questions