Why do I need to specify the size of this list, else it gives list index out of range error

Question

I am trying to parse a list of urls from a webpage. I did the following things:

Got a list of all "a" tags.
Used a for loop to get("href")
While looping, I kept assigning the get value to a new empty list called links

But I kept getting a index out of range error. I thought it might be because of the way I was incrementing the index of links, but I am sure that is not the case. This is the error prone code:

import urllib
import bs4
url = "http://tellerprimer.ucdavis.edu/pdf/"
response = urllib.urlopen(url)
webpage = response.read()
soup = bs4.BeautifulSoup(webpage, 'html.parser')
i = 0
links = []

for tags in soup.find_all('a'):
    links[i] = str(tags.get('href'))
    i +=1
print i, links

I gave links a fixed length and it fixed it, like so:

links = [0]*89 #89 is the length of soup.find_all('a')

I want to know what was causing this problem.

Andy · Accepted Answer

You are attempting to assign something to a non-existent index. When you create links, you create it as an empty list.

Then you do links[i], but links is empty, so there is no ith index.

The proper way to do this is:

links.append(str(tags.get('href')))

This also means that you can eliminate your i variable. It's not needed.

for tags in soup.find_all('a'):
    links.append(str(tags.get('href')))
print links

This will print all 89 links in your links list.

Why do I need to specify the size of this list, else it gives list index out of range error

Answers (2)

Related Questions