Reputation: 4924
urllist = ['http://example.com',
'http://example1.com']
i = 0
while i < len(urllist):
source = urllib.urlopen(urllist[i]).read()
regex = '(\d{3})/">(\w+\s-\s\w+)</a>' # e.g. '435', 'Tom-Jerry'
p = re.compile(regex)
db = re.findall(p, source)
db = [tuple(filter(None, t)) for t in db]
hero_id = []
for j in db:
hero_id.append(j[0])
i += 1
print hero_id
Note that: db = [tuple(filter(None, t)) for t in db]
db
is a list of tuples like this: [('564', 'Tom', 'Jerry'), ('321', 'X-man', 'Hulk')]
. Up the the hero_id = []
line everything works like a charm. The for foop needs to append every number (from every url from the urllist
). It does partly its job. At the end hero_id
list contains only numbers from the last url (the previous numbers are gone). Ideas?
Upvotes: 0
Views: 97
Reputation: 6326
That's because you set hero_id to an empty list at every iteration in the 'while' (hero_id = []
)
Place that just after i = 0
Or you can simplify the code like so:
urllist = ['http://example.com', 'http://example1.com']
hero_id = []
for url in urllist:
db = re.findall('(\d{3})/">(\w+\s-\s\w+)</a>', urllib.urlopen(url).read(), re.DOTALL)
for j in db:
hero_id.append(tuple(filter(None, j))[0])
print hero_id
Upvotes: 4
Reputation: 4330
Since your hero_id is set in the while loop, it is over written at every iteration. Make your hero_id variable global and do not reset it.
hero_id = []
while ():
#your code
Upvotes: 1