alvas
alvas

Reputation: 122142

Subclassing a file object to "fake" it as an iterable - Python

My thought was to get rid of how users are constantly using seek(0) to reset the text file reading.

So instead I've tried to create a MyReader that's an collections.Iterator and then using .reset() to replace seek(0) and then it continues from where it last yielded by retaining a self.iterable object.

class MyReader(collections.Iterator):
    def __init__(self, filename):
        self.filename = filename
        self.iterable = self.__iterate__()

    def __iterate__(self):
        with open(self.filename) as fin:
            for line in fin:
                yield line.strip()

    def __iter__(self):
        for line in self.iterable:
            yield line

    def __next__(self):
        return next(self.iterable)

    def reset(self): 
        self.iterable = self.__iterate__()

The usage would be something like:

$ cat english.txt
abc
def
ghi
jkl

$ python

>>> data = MyReader('english.txt')
>>> print(next(data))
abc
>>> print(next(data))
def
>>> data.reset()
>>> print(next(data))
abc

My question is does this already exist in Python-verse somewhere? Esp. if there's already a native object that does something like this, I would like to avoid reinventing the wheel =)

If it doesn't exist? Does the object look a little unpythonic? Since it says it's an iterator but the true Iterator is actually the self.iterable and the other functions are wrapping around it to do "resets".

Upvotes: 0

Views: 528

Answers (2)

PM 2Ring
PM 2Ring

Reputation: 55489

I have a couple of criticisms of your MyReader class. I was going to post an alternative that's a context manager but Sraw beat me to it. ;)

You shouldn't use names that start and end with double underscores like __iterate__. Such names are essentially reserved for the language implementors, and if an official __iterate__ magic method is added to the language your code will break. If you want a private method, you could name it _iterate.

There is a little problem with that __iterate__ method: its with block is only exited when the file has been completely read for the current self.iterable, so if the MyReader instance gets reset then you have an old open file sitting around, consuming a file descriptor. Sure, it'll get closed eventually, when the program exits (or you delete the MyReader instance), but it's messy IMHO.

Also, I'm not totally happy with the yield line.strip(). Sure, it's convenient most of the time when you're reading a text file, but in some cases the caller may want to look at any leading or trailing white space, and you've taken that option away from them.

BTW, that __iter__ method is redundant: your class still does what its supposed to do if you eliminate that method.

Upvotes: 1

Sraw
Sraw

Reputation: 20224

I think it depends on what is your real situation. Let's say if you just want to get rid of file.seek(0), it can be simple:

class MyReader:
    def __init__(self, filename, mode="r"):
        self.file = open(filename, mode)

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        self.close()

    def __iter__(self):
        self.file.seek(0)
        for line in self.file:
            yield line.strip()

    def close(self):
        self.file.close()

You can even use it like a normal context manager:

with MyReader("a.txt") as a:
    for line in a:
        print(line)
    for line in a:
        print(line)

output:

sdfas
asdf
asd
fas
df
asd
f
sdfas
asdf
asd
fas
df
asd
f

Upvotes: 3

Related Questions