Reputation: 1136
I'm trying to port a library over to Python 3. It has a tokenizer for PDF streams. The reader class calls next()
on these tokens. This worked in Python 2, but when I run it in Python 3 I get TypeError: 'PdfTokens' object is not an iterator
.
Selections from tokens.py
concerning iterators:
class PdfTokens(object):
def __init__(self, fdata, startloc=0, strip_comments=True):
self.fdata = fdata
self.iterator = iterator = self._gettoks(startloc)
self.next = next(iterator)
def __iter__(self):
return self.iterator
def _gettoks(self, startloc, cacheobj=_cacheobj,
delimiters=delimiters, findtok=findtok, findparen=findparen,
PdfString=PdfString, PdfObject=PdfObject):
fdata = self.fdata
current = self.current = [(startloc, startloc)]
namehandler = (cacheobj, self.fixname)
cache = {}
while 1:
for match in findtok(fdata, current[0][1]):
current[0] = tokspan = match.span()
token = match.group(1)
firstch = token[0]
if firstch not in delimiters:
token = cacheobj(cache, token, PdfObject)
elif firstch in '/<(%':
if firstch == '/':
# PDF Name
token = namehandler['#' in token](cache, token, PdfObject)
elif firstch == '<':
# << dict delim, or < hex string >
if token[1:2] != '<':
token = cacheobj(cache, token, PdfString)
elif firstch == '(':
ends = None # For broken strings
if fdata[match.end(1)-1] != ')':
nest = 2
m_start, loc = tokspan
for match in findparen(fdata, loc):
loc = match.end(1)
ending = fdata[loc-1] == ')'
nest += 1 - ending * 2
if not nest:
break
if ending and ends is None:
ends = loc, match.end(), nest
token = fdata[m_start:loc]
current[0] = m_start, match.end()
if nest:
(self.error, self.exception)[not ends]('Unterminated literal string')
loc, ends, nest = ends
token = fdata[m_start:loc] + ')' * nest
current[0] = m_start, ends
token = cacheobj(cache, token, PdfString)
elif firstch == '%':
# Comment
if self.strip_comments:
continue
else:
self.exception('Tokenizer logic incorrect -- should never get here')
yield token
if current[0] is not tokspan:
break
else:
if self.strip_comments:
break
raise StopIteration
The beginning of the offending method in the pdfreader file that raises the error:
def findxref(fdata):
''' Find the cross reference section at the end of a file
'''
startloc = fdata.rfind('startxref')
if startloc < 0:
raise PdfParseError('Did not find "startxref" at end of file')
source = PdfTokens(fdata, startloc, False)
tok = next(source)
I was under the impression that all you needed to define a custom iterator object was a .__iter__
method, a .next()
method and to raise a StopIteration error. This class has all these things and yet it stills raises the TypeError.
Furthermore, this library and it's methods worked in Python 2.7 and have ceased to work in a Python 3 environment. What about Python 3 has made this different? What can I do to make the PdfTokens object iterable?
Upvotes: 0
Views: 202
Reputation: 250871
You cannot call next
on PdfTokens
's instance directly, you need to get its iterator first by calling iter()
on it. That's exactly what a for-loop does as well*, it calls iter()
on the object first and gets an iterator and then within the loop __next__
is invoked on that iterator until it is not exhausted:
instance = PdfTokens(fdata, startloc, False)
source = iter(instance)
tok = next(source)
Well not always, if there's no __iter__
defined on the class then the iterator protocol falls back to __getitem__
if defined.
Upvotes: 2