Reputation: 310
I am using python's csv module to extract data from a csv that is constantly being updated by an external tool. I have run into a problem where when I reach the end of the file I get a StopIteration error, however, I would like the script to continue to loop waiting for more lines to be added by the external tool.
What I came up with so far to do this is:
f = open('file.csv')
csvReader = csv.reader(f, delimiter=',')
while 1:
try:
doStuff(csvReader.next())
except StopIteration:
depth = f.tell()
f.close()
f = open('file.csv')
f.seek(depth)
csvReader = csv.reader(f, delimiter=',')
This has the intended functionality but it also seems terrible. Looping after catching the StopIteration is not possible since once StopIteration is thrown, it will throw a StopIteration on every subsequent call to next(). Anyone have any suggestions on how to implement this is in such a way that I don't have to do this silly tell and seeking? Or have a different python module that can easily support this functionality.
Upvotes: 4
Views: 2163
Reputation: 45086
You rarely need to catch StopIteration
explicitly. Do this:
for row in csvReader:
doStuff(row)
As for detecting when new lines are written to the file, you can either popen a write out the Python code for what tail -f
process ortail -f
does. (It isn't complicated; it basically just stat
s the file every second to see if it's changed. Here's the C source code of tail
.)
EDIT: Disappointingly, popening tail -f
doesn't work as I expected in Python 2.x. It seems iterating over the lines of a file is implemented using fread
and a largeish buffer, even if the file is supposed to be unbuffered (like when subprocess.py creates the file, passing bufsize=0). But popening tail
would be a mildly ugly hack anyway.
Upvotes: 0
Reputation: 19145
Your problem is not with the CSV reader, but with the file object itself. You may still have to do the crazy gyrations you're doing in your snippet above, but it would be better to create a file object wrapper or subclass that does it for you, and use that with your CSV reader. That keeps the complexity isolated from your csv processing code.
For instance (warning: untested code):
class ReopeningFile(object):
def __init__(self, filename):
self.filename = filename
self.f = open(self.filename)
def next(self):
try:
self.f.next()
except StopIteration:
depth = self.f.tell()
self.f.close()
self.f = open(self.filename)
self.f.seek(depth)
# May need to sleep here to allow more data to come in
# Also may need a way to signal a real StopIteration
self.next()
def __iter__(self):
return self
Then your main code becomes simpler, as it is freed from having to manage the file reopening (note that you also don't have to restart your csv_reader whenever the file restarts:
import csv
csv_reader = csv.reader(ReopeningFile('data.csv'))
for each in csv_reader:
process_csv_line(each)
Upvotes: 4
Reputation: 10820
Producer-consumer stuff can get a bit tricky. How about using seek and reading bytes instead? What about using a named pipe?
Heck, why not communicate over a local socket?
Upvotes: 2