Reputation: 804
Summary: I'd like to write python scripts that act like bash scripts on the command line, but then I'd also like to pipe them together easily in python. Where I'm having trouble is the glue to make the latter happen.
So imagine I wrote two scripts, script1.py
and script2.py
and I can pipe them together like so:
echo input_string | ./script1.py -a -b | ./script2.py -c -d
How do I get this behavior from within another python file? Here's the way I know, but I don't like:
arg_string_1 = convert_to_args(param_1, param_2)
arg_string_2 = convert_to_args(param_3, param_4)
output_string = subprocess.check_output("echo " + input_string + " | ./script1.py " + arg_string_1 + " | ./script2.py " + arg_string_2)
If I didn't want to take advantage of multithreading, I could do something like this (?):
input1 = StringIO(input_string)
output1 = StringIO()
script1.main(param_1, param_2, input1, output1)
input2 = StringIO(output1.get_value())
output2 = StringIO()
script2.main(param_3, param_4, input2, output2)
Here's the approach I was trying, but I got stuck at writing the glue. I'd appreciate either learning how to finish my approach below, or suggestions for a better design/approach!
My approach: I wrote script1.py
and script2.py
to look like:
#!/usr/bin/python3
... # import sys and define "parse_args"
def main(param_1, param_2, input, output):
for line in input:
...
print(stuff, file=output)
if __name__ == "__main__":
parameter_1, parameter_2 = parse_args(sys.argv)
main(parameter_1, parameter_2, sys.stdin, sys.stdout)
Then I wanted to write something like this, but don't know how to finish:
pipe_out, pipe_in = ????
output = StringIO()
thread_1 = Thread(target=script1.main, args=(param_1, param_2, StreamIO(input_string), pipe_out))
thread_2 = Thread(target=script2.main, args=(param_3, param_4, pipe_in, output)
thread_1.start()
thread_2.start()
thread_1.join()
thread_2.join()
output_str = output.get_value()
Upvotes: 10
Views: 2340
Reputation: 696
There is a very simple solution using the standard Popen
class.
Here's an example:
#this is the master python program
import subprocess
import sys
import os
#note the use of stdin and stdout arguments here
process1 = subprocess.Popen(['./script1.py'], stdin=sys.stdin, stdout=subprocess.PIPE)
process2 = subprocess.Popen(['./script2.py'], stdin=process1.stdout)
process1.wait()
process2.wait()
the two scripts are:
#!/usr/bin/env python
#script1.py
import sys
for line in sys.stdin:
print(line.strip().upper())
Here's the second one
#!/usr/bin/env python
#script2.py
import sys
for line in sys.stdin:
print("<{}>".format(line.strip()))
Upvotes: 1
Reputation: 1639
Redirect the return value to stdout depending on whether the script is being run from the command line:
#!/usr/bin/python3
import sys
# Example function
def main(input):
# Do something with input producing stuff
...
return multipipe(stuff)
if __name__ == '__main__':
def multipipe(data):
print(data)
input = parse_args(sys.argv)
main(input)
else:
def multipipe(data):
return data
Each other script will have the same two definitions of multipipe
. Now, use multipipe
for output.
If you call all the scripts together from the command line $ ./scrip1.py | ./scrip2.py
, each will have __name__ == '__main__'
and so multipipe
will print it all to stdout to be read as an argument by the next script (and return None
, so each function returns None
, but you're not looking at the return values anyway in this case).
If you call them within some other python script, each function will return whatever you passed to multipipe
.
Effectively, you can use your existing functions, just replace print(stuff, file=output)
with return multipipe(stuff)
. Nice and simple.
To use it with multithreading or multiprocessing, set the functions up so that each function returns a single thing, and plug them into a simple function that adds data to a multithreading queue. For an example of such a queueing system, see the sample at the bottom of the Queue docs. With that example, just make sure that each step in the pipeline puts None
(or other sentinel value of your choice - I like ...
for that since it's extremely rare that you'd pass the Ellipsis
object for any reason other than as a marker for its singleton-ness) in the queue to the next one to signify done-ness.
Upvotes: 1
Reputation: 2691
For the "pipe in", uses sys.stdin
with the readlines()
method. (Using method read()
would read one character at a time.)
For passing information from one thread to another, you can use Queue
. You must define one way to signal the end of data. In my example, since all data passed between threads are str
, I simply use a None
object to signal the end of data (since it cannot appear in the transmitted data).
One could also use more threads, or use different functions in threads.
I did not include the sys.argv
in my example to keep it simple. Modifying it to get parameters (parameter1
, ...) should be easy.
import sys
from threading import Thread
from Queue import Queue
import fileinput
def stdin_to_queue( output_queue ):
for inp_line in sys.stdin.readlines(): # input one line at at time
output_queue.put( inp_line, True, None ) # blocking, no timeout
output_queue.put( None, True, None ) # signal the end of data
def main1(input_queue, output_queue, arg1, arg2):
do_loop = True
while do_loop:
inp_data = input_queue.get(True)
if inp_data is None:
do_loop = False
output_queue.put( None, True, None ) # signal end of data
else:
out_data = arg1 + inp_data.strip('\r\n').upper() + arg2 # or whatever transformation...
output_queue.put( out_data, True, None )
def queue_to_stdout(input_queue):
do_loop = True
while do_loop:
inp_data = input_queue.get(True)
if inp_data is None:
do_loop = False
else:
sys.stdout.write( inp_data )
def main():
q12 = Queue()
q23 = Queue()
q34 = Queue()
t1 = Thread(target=stdin_to_queue, args=(q12,) )
t2 = Thread(target=main1, args=(q12,q23,'(',')') )
t3 = Thread(target=main1, args=(q23,q34,'[',']') )
t4 = Thread(target=queue_to_stdout, args=(q34,))
t1.start()
t2.start()
t3.start()
t4.start()
main()
Finally, I tested this program (python2) with a text file.
head sometextfile.txt | python script.py
Upvotes: 2