usul
usul

Reputation: 804

How to make python scripts pipe-able both in bash and within python

Summary: I'd like to write python scripts that act like bash scripts on the command line, but then I'd also like to pipe them together easily in python. Where I'm having trouble is the glue to make the latter happen.

So imagine I wrote two scripts, script1.py and script2.py and I can pipe them together like so:

echo input_string | ./script1.py -a -b | ./script2.py -c -d

How do I get this behavior from within another python file? Here's the way I know, but I don't like:

arg_string_1 = convert_to_args(param_1, param_2)
arg_string_2 = convert_to_args(param_3, param_4)
output_string = subprocess.check_output("echo " + input_string + " | ./script1.py " + arg_string_1 + " | ./script2.py " + arg_string_2)

If I didn't want to take advantage of multithreading, I could do something like this (?):

input1  = StringIO(input_string)
output1 = StringIO()
script1.main(param_1, param_2, input1, output1)
input2  = StringIO(output1.get_value())
output2 = StringIO()
script2.main(param_3, param_4, input2, output2)

Here's the approach I was trying, but I got stuck at writing the glue. I'd appreciate either learning how to finish my approach below, or suggestions for a better design/approach!

My approach: I wrote script1.py and script2.py to look like:

#!/usr/bin/python3

... # import sys and define "parse_args"

def main(param_1, param_2, input, output):
   for line in input:
     ...
     print(stuff, file=output)

if __name__ == "__main__":
  parameter_1, parameter_2 = parse_args(sys.argv)
  main(parameter_1, parameter_2, sys.stdin, sys.stdout)

Then I wanted to write something like this, but don't know how to finish:

pipe_out, pipe_in = ????
output = StringIO()
thread_1 = Thread(target=script1.main, args=(param_1, param_2, StreamIO(input_string), pipe_out))
thread_2 = Thread(target=script2.main, args=(param_3, param_4, pipe_in, output)
thread_1.start()
thread_2.start()
thread_1.join()
thread_2.join()
output_str = output.get_value()

Upvotes: 10

Views: 2340

Answers (3)

Yoav Kleinberger
Yoav Kleinberger

Reputation: 696

There is a very simple solution using the standard Popen class.

Here's an example:

#this is the master python program
import subprocess
import sys
import os

#note the use of stdin and stdout arguments here
process1 = subprocess.Popen(['./script1.py'], stdin=sys.stdin, stdout=subprocess.PIPE)
process2 = subprocess.Popen(['./script2.py'], stdin=process1.stdout)

process1.wait()
process2.wait()

the two scripts are:

#!/usr/bin/env python
#script1.py
import sys

for line in sys.stdin:
    print(line.strip().upper())

Here's the second one

#!/usr/bin/env python
#script2.py
import sys

for line in sys.stdin:
    print("<{}>".format(line.strip()))

Upvotes: 1

Vivian
Vivian

Reputation: 1639

Redirect the return value to stdout depending on whether the script is being run from the command line:

#!/usr/bin/python3
import sys

# Example function
def main(input):
    # Do something with input producing stuff
    ...
    return multipipe(stuff)

if __name__ == '__main__':
    def multipipe(data):
        print(data)

    input = parse_args(sys.argv)
    main(input)
else:
    def multipipe(data):
        return data

Each other script will have the same two definitions of multipipe. Now, use multipipe for output.

If you call all the scripts together from the command line $ ./scrip1.py | ./scrip2.py, each will have __name__ == '__main__' and so multipipe will print it all to stdout to be read as an argument by the next script (and return None, so each function returns None, but you're not looking at the return values anyway in this case).

If you call them within some other python script, each function will return whatever you passed to multipipe.

Effectively, you can use your existing functions, just replace print(stuff, file=output) with return multipipe(stuff). Nice and simple.

To use it with multithreading or multiprocessing, set the functions up so that each function returns a single thing, and plug them into a simple function that adds data to a multithreading queue. For an example of such a queueing system, see the sample at the bottom of the Queue docs. With that example, just make sure that each step in the pipeline puts None (or other sentinel value of your choice - I like ... for that since it's extremely rare that you'd pass the Ellipsis object for any reason other than as a marker for its singleton-ness) in the queue to the next one to signify done-ness.

Upvotes: 1

Sci Prog
Sci Prog

Reputation: 2691

For the "pipe in", uses sys.stdin with the readlines() method. (Using method read() would read one character at a time.)

For passing information from one thread to another, you can use Queue. You must define one way to signal the end of data. In my example, since all data passed between threads are str, I simply use a None object to signal the end of data (since it cannot appear in the transmitted data).

One could also use more threads, or use different functions in threads.

I did not include the sys.argvin my example to keep it simple. Modifying it to get parameters (parameter1, ...) should be easy.

import sys
from threading import Thread
from Queue import Queue
import fileinput

def stdin_to_queue( output_queue ):
  for inp_line in sys.stdin.readlines():     # input one line at at time                                                
    output_queue.put( inp_line, True, None )  # blocking, no timeout
  output_queue.put( None, True, None )    # signal the end of data                                                  


def main1(input_queue, output_queue, arg1, arg2):
  do_loop = True
  while do_loop:
    inp_data = input_queue.get(True)
    if inp_data is None:
      do_loop = False
      output_queue.put( None, True, None )  # signal end of data                                                    
    else:
      out_data = arg1 + inp_data.strip('\r\n').upper() + arg2 #  or whatever transformation...                                    
      output_queue.put( out_data, True, None )

def queue_to_stdout(input_queue):
  do_loop = True
  while do_loop:
    inp_data = input_queue.get(True)
    if inp_data is None:
      do_loop = False
    else:
      sys.stdout.write( inp_data )


def main():
  q12 = Queue()
  q23 = Queue()
  q34 = Queue()
  t1 = Thread(target=stdin_to_queue, args=(q12,) )
  t2 = Thread(target=main1, args=(q12,q23,'(',')') )
  t3 = Thread(target=main1, args=(q23,q34,'[',']') )
  t4 = Thread(target=queue_to_stdout, args=(q34,))
  t1.start()
  t2.start()
  t3.start()
  t4.start()


main()

Finally, I tested this program (python2) with a text file.

head sometextfile.txt | python script.py 

Upvotes: 2

Related Questions