Turning an object into an iterator in Python 3?

Question

I'm trying to port a library over to Python 3. It has a tokenizer for PDF streams. The reader class calls next() on these tokens. This worked in Python 2, but when I run it in Python 3 I get TypeError: 'PdfTokens' object is not an iterator.

Selections from tokens.py concerning iterators:

class PdfTokens(object):
    def __init__(self, fdata, startloc=0, strip_comments=True):
        self.fdata = fdata
        self.iterator = iterator = self._gettoks(startloc)
        self.next = next(iterator)

    def __iter__(self):
        return self.iterator

    def _gettoks(self, startloc, cacheobj=_cacheobj,
                       delimiters=delimiters, findtok=findtok, findparen=findparen,
                       PdfString=PdfString, PdfObject=PdfObject):
        fdata = self.fdata
        current = self.current = [(startloc, startloc)]
        namehandler = (cacheobj, self.fixname)
        cache = {}
        while 1:
            for match in findtok(fdata, current[0][1]):
                current[0] = tokspan = match.span()
                token = match.group(1)
                firstch = token[0]
                if firstch not in delimiters:
                    token = cacheobj(cache, token, PdfObject)
                elif firstch in '/<(%':
                    if firstch == '/':
                        # PDF Name
                        token = namehandler['#' in token](cache, token, PdfObject)
                    elif firstch == '<':
                        # << dict delim, or < hex string >
                        if token[1:2] != '<':
                            token = cacheobj(cache, token, PdfString)
                    elif firstch == '(':
                        ends = None  # For broken strings
                        if fdata[match.end(1)-1] != ')':
                            nest = 2
                            m_start, loc = tokspan
                            for match in findparen(fdata, loc):
                                loc = match.end(1)
                                ending = fdata[loc-1] == ')'
                                nest += 1 - ending * 2
                                if not nest:
                                    break
                                if ending and ends is None:
                                    ends = loc, match.end(), nest
                            token = fdata[m_start:loc]
                            current[0] = m_start, match.end()
                            if nest:
                                (self.error, self.exception)[not ends]('Unterminated literal string')
                                loc, ends, nest = ends
                                token = fdata[m_start:loc] + ')' * nest
                                current[0] = m_start, ends
                        token = cacheobj(cache, token, PdfString)
                    elif firstch == '%':
                        # Comment
                        if self.strip_comments:
                            continue
                    else:
                        self.exception('Tokenizer logic incorrect -- should never get here')

                yield token
                if current[0] is not tokspan:
                    break
            else:
                if self.strip_comments:
                    break
                raise StopIteration

The beginning of the offending method in the pdfreader file that raises the error:

def findxref(fdata):
    ''' Find the cross reference section at the end of a file
    '''
    startloc = fdata.rfind('startxref')
    if startloc < 0:
        raise PdfParseError('Did not find "startxref" at end of file')
    source = PdfTokens(fdata, startloc, False)
    tok = next(source)

I was under the impression that all you needed to define a custom iterator object was a .__iter__method, a .next() method and to raise a StopIteration error. This class has all these things and yet it stills raises the TypeError.

Furthermore, this library and it's methods worked in Python 2.7 and have ceased to work in a Python 3 environment. What about Python 3 has made this different? What can I do to make the PdfTokens object iterable?

Ashwini Chaudhary · Accepted Answer

You cannot call next on PdfTokens's instance directly, you need to get its iterator first by calling iter() on it. That's exactly what a for-loop does as well*, it calls iter() on the object first and gets an iterator and then within the loop __next__ is invoked on that iterator until it is not exhausted:

instance = PdfTokens(fdata, startloc, False)
source = iter(instance)
tok = next(source)

_{Well not always, if there's no __iter__ defined on the class then the iterator protocol falls back to __getitem__ if defined.}

Turning an object into an iterator in Python 3?

Answers (1)

Related Questions