
Reputation: 1136

Turning an object into an iterator in Python 3?

I'm trying to port a library over to Python 3. It has a tokenizer for PDF streams. The reader class calls next() on these tokens. This worked in Python 2, but when I run it in Python 3 I get TypeError: 'PdfTokens' object is not an iterator.

Selections from concerning iterators:

class PdfTokens(object):
    def __init__(self, fdata, startloc=0, strip_comments=True):
        self.fdata = fdata
        self.iterator = iterator = self._gettoks(startloc) = next(iterator)

    def __iter__(self):
        return self.iterator

    def _gettoks(self, startloc, cacheobj=_cacheobj,
                       delimiters=delimiters, findtok=findtok, findparen=findparen,
                       PdfString=PdfString, PdfObject=PdfObject):
        fdata = self.fdata
        current = self.current = [(startloc, startloc)]
        namehandler = (cacheobj, self.fixname)
        cache = {}
        while 1:
            for match in findtok(fdata, current[0][1]):
                current[0] = tokspan = match.span()
                token =
                firstch = token[0]
                if firstch not in delimiters:
                    token = cacheobj(cache, token, PdfObject)
                elif firstch in '/<(%':
                    if firstch == '/':
                        # PDF Name
                        token = namehandler['#' in token](cache, token, PdfObject)
                    elif firstch == '<':
                        # << dict delim, or < hex string >
                        if token[1:2] != '<':
                            token = cacheobj(cache, token, PdfString)
                    elif firstch == '(':
                        ends = None  # For broken strings
                        if fdata[match.end(1)-1] != ')':
                            nest = 2
                            m_start, loc = tokspan
                            for match in findparen(fdata, loc):
                                loc = match.end(1)
                                ending = fdata[loc-1] == ')'
                                nest += 1 - ending * 2
                                if not nest:
                                if ending and ends is None:
                                    ends = loc, match.end(), nest
                            token = fdata[m_start:loc]
                            current[0] = m_start, match.end()
                            if nest:
                                (self.error, self.exception)[not ends]('Unterminated literal string')
                                loc, ends, nest = ends
                                token = fdata[m_start:loc] + ')' * nest
                                current[0] = m_start, ends
                        token = cacheobj(cache, token, PdfString)
                    elif firstch == '%':
                        # Comment
                        if self.strip_comments:
                        self.exception('Tokenizer logic incorrect -- should never get here')

                yield token
                if current[0] is not tokspan:
                if self.strip_comments:
                raise StopIteration

The beginning of the offending method in the pdfreader file that raises the error:

def findxref(fdata):
    ''' Find the cross reference section at the end of a file
    startloc = fdata.rfind('startxref')
    if startloc < 0:
        raise PdfParseError('Did not find "startxref" at end of file')
    source = PdfTokens(fdata, startloc, False)
    tok = next(source)

I was under the impression that all you needed to define a custom iterator object was a .__iter__method, a .next() method and to raise a StopIteration error. This class has all these things and yet it stills raises the TypeError.

Furthermore, this library and it's methods worked in Python 2.7 and have ceased to work in a Python 3 environment. What about Python 3 has made this different? What can I do to make the PdfTokens object iterable?

Upvotes: 0

Views: 203

Answers (1)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251146

You cannot call next on PdfTokens's instance directly, you need to get its iterator first by calling iter() on it. That's exactly what a for-loop does as well*, it calls iter() on the object first and gets an iterator and then within the loop __next__ is invoked on that iterator until it is not exhausted:

instance = PdfTokens(fdata, startloc, False)
source = iter(instance)
tok = next(source)

Well not always, if there's no __iter__ defined on the class then the iterator protocol falls back to __getitem__ if defined.

Upvotes: 2

Related Questions