Reputation: 3255
I want a python program that acts as a wrapper around bash tools.
For this purpose, I need to read the first line of header files into python to then generate the bash commands from the information found in all subsequent lines. No lines need to be read twice. See MWE 1 below.
For this purpose, I need to open the input files twice: Once to read the first line, and the second time by the bash tools invoked via subprocess
.
This works if the input files are regular files, but if they are named pipes or /dev/fd/N
-like files used by bashs process substitution, the python program only processes the header lines and then hangs, waiting for further pipe input.
The reason, I suspect, is that the python process sends SIGPIPE to the writing end of the pipe after reading the header line. This terminates the writing and and the bash tools subprocesses have a pipe without writing end. I tried trap "" PIPE
, but to no avail. Writer process still terminates (MWE 2).
The question is: How can I read one line from a Linux named pipe with one process, then keep it open for another process?
MWE 1: Sample python program
# --- MWE1.py -------------
import subprocess as sp
import sys
a = sys.argv[1]
b = sys.argv[2]
fd_a = open(a,"rt")
fd_b = open(b,"rt")
header = "\t".join([fd_a.readline().rstrip(),
fd_b.readline().rstrip()])
print("H: "+header)
cmd = "paste {} {}".format(a,b)
sp.check_call(["/bin/bash","-c",cmd], close_fds=False)
RUN by:
mkfifo myfifo
cat > file1 << EOF
a b
1 2
3 4
5 6
EOF
cat > file2 << EOF
Y Z
10 11
12 13
14 15
EOF
cat file1 > myfifo &
./MWE1.py myfifo file2
# Prints "H: A B Y Z" and waits for pipe input forever...
MWE 2: Demonstrating problem in bash
trap "" PIPE # I thought this would prevent exiting
cat file1 > myfifo & strace -p $! -e write,signal
## In another shell...
head -n1 myfifo # terminates "cat file1 > myfifo"
cat myfifo # waits forever for pipe input...
Upvotes: 0
Views: 1114
Reputation: 3255
I've got it!! :-) The solution is to open a file descriptor using os.open
. That is simply a number which denotes an open file connected to the program. It is not to be confused with the file objects created by the builitin open
function.
Steps:
os.open
set_inheritable(fd, bool)
subprocess.Popen
parameters close_fds=False or pass_fds=[...]Code: (not tested, I admit, but it worked like this in my program)
import os
import sys
import subprocess as sp
# Open -> gives file descriptor (integer, corresponds to /dev/fd/*)
fd = os.open(FILENAME,os.O_RDONLY) # see os module doc for O_RDONLY
# This gives a standard file object:
handle = os.fdopen(fd,'rb',0)
# Read one line, do whatever you want in python.
# Read in using unbuffered I/O, this requires reading in binary mode.
# The resulting bytes array must be converted to string using an encoding
header = handle.readline.decode(sys.getdefaultencoding())
header = header.rstrip().split()
# Make fd inheritable by children processes
os.set_inheritable(fd, True)
# Open child process, pass file descriptor as input.
# Using /dev/fds/{fd} or <(cat <&{fd}) (bash only) can turn a file descriptor stream
# into a file, if needed for child program.
# This child process sees any input which is not already consumed by the
# above lines
sp.check_call("wc -l <&{fd}".format(fd=fd),pass_fds=(fd,))
Upvotes: 0
Reputation: 42778
In your case, I wouldn't use an external program. Just do it with python:
from itertools import zip_longest
with open(sys.argv[1], 'rt') as a:
with open(sys.argv[2], 'rt') as b:
header = 'H: %s\t%s' % (next(a).rstrip(), next(b).rstrip())
print(header)
for m,n in zip_longest(a,b, fillvalue=''):
print('%s\t%s' % (m.rstrip(), n.rstrip()))
Upvotes: 1
Reputation: 7376
Contrarily to regular files, pipes are just as advertised: data pipes. When the bytes are read, they are removed.
So if a
and b
are opened to the same named pipe, and if you want to read the same data again, you need the writer to send it again; otherwise the process hangs for want of some data to read.
Upvotes: 1