Reputation: 4924
urllist = ['http://example.com',
'http://example1.com']
i = 0
while i < len(urllist):
source = urllib.urlopen(urllist[i]).read()
regex = '(\d{3})/">(\w+\s-\s\w+)</a>' # e.g. '435', 'Tom-Jerry'
p = re.compile(regex)
db = re.findall(p, source)
db = [tuple(filter(None, t)) for t in db]
hero_id = []
for i in db:
hero_id.append(i[0])
i += 1
print hero_id
db = [tuple(filter(None, t)) for t in db]
db
is a list of tuples like this: [('564', 'Tom', 'Jerry'), ('321', 'X-man', 'Hulk')]
The logic behind this should be the following: Start off with the urllist[0]
, search for the regex, collect the db
, for every tuple in db
, take the [0]
element from the tuple (the number) and append it to the hero_id
list. While you're done, add 1 to i
and repeat the whole process for the next url from urllist
while there is none left.
When I run this code, I get this:
i += 1
TypeError: can only concatenate tuple (not "int") to tuple
i += 1
in the code is outside the for loop so this exception surprises me a little bit. Ideas?
Upvotes: 0
Views: 4117
Reputation: 3820
The for loop for i in db:
is changing the value of i
inside the while
loop. Use a different (more descriptive) name in the for loop.
Upvotes: 2
Reputation: 2800
The "for i in db" loop assigns a tuple to i. The scope of i is the function (or module, if this is module-scope code).
The only loop syntax in Python 2 that has its own scope is the generator expression.
Upvotes: 2