Reputation: 1601
Executing this in shell gets me tangible results:
wget -O c1 --no-cache "http://some.website" | sed "1,259d" c1 | sed "4,2002d"
Doing this in Python gets me nothing:
subprocess.call(shlex.split("wget -O c1 --no-cache \"http://some.website/tofile\""))
c1 = open("c1",'w')
first = subprocess.Popen(shlex.split("sed \"1,259d\" c1"), stdout=subprocess.PIPE)
subprocess.Popen(shlex.split("sed \"4,2002d\""), stdin=first.stdout, stdout=c1)
c1.close()
Doing this also gets me no results:
c1.write(subprocess.Popen(shlex.split("sed \"4,2002d\""), stdin=first.stdout, stdout=subprocess.PIPE).communicate()[0])
By 'gets me nothing' I mean blank output in the file. Does anyone see anything out of the ordinary here?
Upvotes: 1
Views: 331
Reputation: 1601
In the interest of making life easier for people who more-or-less may be running into the same type of problem, I have decided to post the final revised code, which factored in comments about c1
and overwriting of data. Of particular interest is the usage of communicate()
which helped to completely eliminate any manifestations of zombie processes, which were quite irritating. Also, I found it useful to use subprocess.call
in portions where piping wasn't necessary. No wait()
was necessary in the end. Ultimately, staying away from sed
and wget
is a good idea, especially with Python's inbuilt tools and urllib2
.
p0 = subprocess.call(shlex.split("wget -Oc1 --no-cache \"http://Some.website/tofile\""))
p1 = subprocess.Popen(shlex.split("sed \"1,261d\" c1"), stdout=subprocess.PIPE)
with open("cc1", 'w') as cc1:
p2 = subprocess.Popen(shlex.split("sed \"3,2002d\""), stdin=p1.stdout, stdout=cc1)
p2.communicate()
p1.communicate()
p3 = subprocess.call(shlex.split("mv cc1 c1"))
Upvotes: 0
Reputation: 34270
The statement c1 = open("c1",'w')
opens file c1
for writing and truncates any existing data, so everything wget wrote to the file gets erased before you call sed.
Anyway, I think shlex.split
is generally awkward. I prefer to build the args list manually:
from subprocess import Popen, PIPE
p0 = Popen(['wget', '-O', '-', 'http://www.google.com'], stdout=PIPE)
p1 = Popen(['sed', '2,8d'], stdin=p0.stdout, stdout=PIPE)
with open('c1', 'w') as c1:
p2 = Popen(['sed', '2,7d'], stdin=p1.stdout, stdout=c1)
p2.wait()
However, there's no obvious reason a Python programmer should have to call out to sed. Python has string methods and regular expressions. Also, instead of wget you can use urllib2.urlopen
.
Upvotes: 2
Reputation: 64298
I always use plumbum for running external commands. It provides a very intuitive interface and, of course, takes care of escaping for me.
Would look something like:
from plumbum.cmd import wget, sed
cmd1 = wget['-O', 'c1']['--no-cache']["http://some.website"]
cmd2 = sed["1,259d"]['c1'] | sed["4,2002d"]
print cmd1
cmd1() # run it
print cmd2
cmd2() # run it
Upvotes: 3
Reputation: 32522
Why not just do everything all in pipes and send the output to a file?
wget -O - "http://www.google.com" | sed "1,259d" | sed "4,2002d" > c1
Or if you don't want to send it to a file, and want it on stdout instead:
wget -O - "http://www.google.com" | sed "1,259d" | sed "4,2002d"
And if you want to do it in Python:
pipe = subprocess.Popen(shlex.split("wget -O - \"http://www.google.com\" | sed \"1,259d\" | sed \"4,2002d\""), stdout=subprocess.PIPE)
result = pipe.communicate()[0]
Upvotes: 1