CNeo
CNeo

Reputation: 746

Pipe output from subprocess to file, and then read it back

I have a python script that runs a subprocess to get some data and then process it. What I'm trying to achieve is have the data written to a file, and then use the data from the file to do the processing (the reason is that the subprocess is slow, but can change based on the date, time, and parameters I use, and I need to run the script frequently)

I've tried various methods, including opening the file as w+ and trying to seek to the beginning after the write is done, but nothing seems to work - the file is written, but when I try to read back from it (using file.readline()) i get EOF back.

This is what I'm essentially trying to accomplish:

      myFile = open(fileName, "w")
      p = subprocess.Popen(args, stdout=myFile)
      myFile.flush()    # force the file to disk
      os.fsync(myFile)  # ..
      myFile.close()

      myFile = open(fileName, "r")
      while myFile.readline():
        pass # do stuff
      myFile.close()

But even though the file is correctly written (after the script runs, i can see the contents of the file), readline never returns a valid line. Like I said I also tried using the same file object, and doing seek(0) on it, to no luck. This only worked when opening the file as r+, which fails when the file doesn't already exist.

Any help would be appreciated. Also if there's a cleaner way to do this, i'm open to it :)

PS: I realize I can Popen and stdout to a pipe, read from the pipe and then write line by line the data to the file as I do that, but I'm trying to separate the creation of the data file from the reading.

Upvotes: 1

Views: 2115

Answers (3)

CNeo
CNeo

Reputation: 746

@James Aylett pointed me to the right path, it appears that my problem was that subprocess.Popen wasn't finished running when I call .flush().

The solution, is to call p.wait() right after the subprocess.Popen call, to allow for the underlying command to finish. After doing that, .flush does the right thing (since all the data is there), and I can proceed to read from the file.

So the above code becomes:

  myFile = open(fileName, "w")
  p = subprocess.Popen(args, stdout=myFile)

  p.wait()          # <-- Missing line

  myFile.flush()    # force the file to disk
  os.fsync(myFile)  # ..
  myFile.close()

  myFile = open(fileName, "r")
  while myFile.readline():
    pass # do stuff
  myFile.close()

And then it all works!

Upvotes: 0

Eduardo Ivanec
Eduardo Ivanec

Reputation: 11862

This should work as is provided the subprocess is finishing in time (see James's answer).

If you want to wait for it to finish, add p.wait() after the Popen invocation.

What is your actual while loop, though? while myFile.readline() makes it seem as you're not actually saving the line for anything. Try this:

myFile = open(fileName, "r")
print myFile.readlines()
myFile.close()

Or, if you want to interactively examine the state of your program:

myFile = open(fileName, "r")
import pdb; pdb.set_trace()
myFile.close()

Then you can do things like print myFile.readlines() after it stops.

Upvotes: 0

James Aylett
James Aylett

Reputation: 3372

The subprocess almost certainly isn't finishing before you try to read from the file. In fact, it's likely that the subprocess isn't even writing anything before you try to read from the file. For true separation you're going to have to have the subprocess write to a temporary file then replace the file you read from, so that you either read the previous version or the new version but never get to see the partially-written file from the new version.

You can do this in a number of ways; the easiest would be to change the subprocess, but I don't know if that's an option for you here. Alternatively, you can wrap it in your own separate script to manage the files. You probably don't want to call the subprocess in the script that analyses the output file either; you'll want a cronjob or something to regenerate periodically.

Upvotes: 2

Related Questions