eazar001
eazar001

Reputation: 1601

Sub-processing pipe write to file malfunction

Executing this in shell gets me tangible results:

wget -O c1 --no-cache "http://some.website" | sed "1,259d" c1 | sed "4,2002d"

Doing this in Python gets me nothing:

subprocess.call(shlex.split("wget -O c1 --no-cache \"http://some.website/tofile\""))
c1 = open("c1",'w')
first = subprocess.Popen(shlex.split("sed \"1,259d\" c1"), stdout=subprocess.PIPE)

subprocess.Popen(shlex.split("sed \"4,2002d\""), stdin=first.stdout, stdout=c1)
c1.close()

Doing this also gets me no results:

c1.write(subprocess.Popen(shlex.split("sed \"4,2002d\""), stdin=first.stdout, stdout=subprocess.PIPE).communicate()[0])

By 'gets me nothing' I mean blank output in the file. Does anyone see anything out of the ordinary here?

Upvotes: 1

Views: 331

Answers (4)

eazar001
eazar001

Reputation: 1601

In the interest of making life easier for people who more-or-less may be running into the same type of problem, I have decided to post the final revised code, which factored in comments about c1 and overwriting of data. Of particular interest is the usage of communicate() which helped to completely eliminate any manifestations of zombie processes, which were quite irritating. Also, I found it useful to use subprocess.call in portions where piping wasn't necessary. No wait() was necessary in the end. Ultimately, staying away from sed and wget is a good idea, especially with Python's inbuilt tools and urllib2.

p0 = subprocess.call(shlex.split("wget -Oc1 --no-cache \"http://Some.website/tofile\""))
p1 = subprocess.Popen(shlex.split("sed \"1,261d\" c1"), stdout=subprocess.PIPE)

with open("cc1", 'w') as cc1:
    p2 = subprocess.Popen(shlex.split("sed \"3,2002d\""), stdin=p1.stdout, stdout=cc1)
    p2.communicate()
    p1.communicate()
    p3 = subprocess.call(shlex.split("mv cc1 c1"))

Upvotes: 0

Eryk Sun
Eryk Sun

Reputation: 34270

The statement c1 = open("c1",'w') opens file c1 for writing and truncates any existing data, so everything wget wrote to the file gets erased before you call sed.

Anyway, I think shlex.split is generally awkward. I prefer to build the args list manually:

from subprocess import Popen, PIPE

p0 = Popen(['wget', '-O', '-', 'http://www.google.com'], stdout=PIPE)
p1 = Popen(['sed', '2,8d'], stdin=p0.stdout, stdout=PIPE) 
with open('c1', 'w') as c1:
    p2 = Popen(['sed', '2,7d'], stdin=p1.stdout, stdout=c1)
    p2.wait()

However, there's no obvious reason a Python programmer should have to call out to sed. Python has string methods and regular expressions. Also, instead of wget you can use urllib2.urlopen.

Upvotes: 2

shx2
shx2

Reputation: 64298

I always use plumbum for running external commands. It provides a very intuitive interface and, of course, takes care of escaping for me.

Would look something like:

from plumbum.cmd import wget, sed
cmd1 = wget['-O', 'c1']['--no-cache']["http://some.website"]
cmd2 = sed["1,259d"]['c1'] | sed["4,2002d"]
print cmd1
cmd1()  # run it
print cmd2
cmd2()  # run it

Upvotes: 3

Jordan
Jordan

Reputation: 32522

Why not just do everything all in pipes and send the output to a file?

wget -O - "http://www.google.com" | sed "1,259d" | sed "4,2002d" > c1

Or if you don't want to send it to a file, and want it on stdout instead:

wget -O - "http://www.google.com" | sed "1,259d" | sed "4,2002d"

And if you want to do it in Python:

pipe = subprocess.Popen(shlex.split("wget -O - \"http://www.google.com\" | sed \"1,259d\" | sed \"4,2002d\""), stdout=subprocess.PIPE)
result = pipe.communicate()[0]

Upvotes: 1

Related Questions