Reputation: 5101
I am working on a problem and got stuck at a wall
I have a (potentially large) set of text files, and I need to apply a sequence of filters and transformations to it and export it to some other places.
so I roughly have
def apply_filter_transformer(basepath = None, newpath = None, fts= None):
#because all the raw studies in basepath should not be modified, so I first cp all to newpath
for i in listdir(basepath):
file(path.join(newpath, i), "wb").writelines(file(path.join(basepath, i)).readlines())
for i in listdir(newpath):
fileobj = open(path.join(newpath, i), "r+")
for fcn in fts:
fileobj = fcn(fileobj)
if fileobj is not None:
fileobj.writelines(fileobj.readlines())
try:
fileobj.close()
except:
print i, "at", fcn
pass
def main():
apply_filter_transformer(path.join(pardir, pardir, "studies"),
path.abspath(path.join(pardir, pardir, "filtered_studies")),
[
#transformer_addMemo,
filter_executable,
transformer_identity,
filter_identity,
])
and fts in apply_filter_transformer is a list of function that takes a python file object and return a python file object. The problem that I went into is that when I want to insert strings into a text object, I get uninformative error and got stuck for all morning.
def transformer_addMemo(fileobj):
STYLUSMEMO =r"""hellow world"""
study = fileobj.read()
location = re.search(r"</BasicOptions>", study)
print fileobj.name
print fileobj.mode
fileobj.seek(0)
fileobj.write(study[:location.end()] + STYLUSMEMO + study[location.end():])
return fileobj
and this gives me
Traceback (most recent call last):
File "E:\mypy\reg_test\src\preprocessor\preprocessor.py", line 292, in <module>
main()
File "E:\mypy\reg_test\src\preprocessor\preprocessor.py", line 288, in main
filter_identity,
File "E:\mypy\reg_test\src\preprocessor\preprocessor.py", line 276, in apply_filter_transformer
fileobj.writelines(fileobj.readlines())
IOError: [Errno 0] Error
If anyone can give me more info on the error, I would appreciate very very much.
Upvotes: 0
Views: 924
Reputation: 96920
It's not really possible to tell what's causing the error from the code you posted. The problem may be in the protocol you've adopted for your transformation functions.
I'll simplify the code a bit:
fileobj = file.open(path, mode)
fileobj = fcn(fileobj)
fileobj.writelines(fileobj.readlines())
What assurance do I have that fcn
returns a file that's open in the mode that my original file was? That it returns a file that's open at all? That it returns a file? Well, I don't.
It doesn't seem like there's any reason for you to even be using file objects in your process. Since you're reading the entire file into memory, why not just make your transformation functions take and return strings? So your code would look like this:
with open(filename, "r") as f:
s = f.read()
for transform_function in transforms:
s = transform_function(s)
with open(filename, "w") as f:
f.write(s)
Among other things, this totally decouples the file I/O part of your program from the data-transformation part, so that problems in one don't affect the other.
Upvotes: 1
Reputation: 115011
There is handy python module for modifing or reading a group of files: fileinput
I'm not sure what is causing this error. But you are reading the whole file into memory which is a bad idea in your case because the files are potentially large. Using fileinput you can replace the files easily. For example:
import fileinput
import sys
for line in fileinput.input(list_of_files, inplace=True):
sys.stdout.write(line)
if keyword in line:
sys.stdout.write(my_text)
Upvotes: 1