Reputation: 25
I am trying to parse a list of urls from a webpage. I did the following things:
get("href")
But I kept getting a index out of range error. I thought it might be because of the way I was incrementing the index of links, but I am sure that is not the case. This is the error prone code:
import urllib
import bs4
url = "http://tellerprimer.ucdavis.edu/pdf/"
response = urllib.urlopen(url)
webpage = response.read()
soup = bs4.BeautifulSoup(webpage, 'html.parser')
i = 0
links = []
for tags in soup.find_all('a'):
links[i] = str(tags.get('href'))
i +=1
print i, links
I gave links a fixed length and it fixed it, like so:
links = [0]*89 #89 is the length of soup.find_all('a')
I want to know what was causing this problem.
Upvotes: 2
Views: 52
Reputation: 97140
The list is initially empty, so you're trying to assign values to non-existing index locations in the list.
Use append()
to add items to a list:
links = []
for tags in soup.find_all('a'):
links.append(str(tags.get('href')))
Or use map()
instead:
links = map(lambda tags: str(tags.get('href')), soup.find_all('a'))
Or use a list comprehension:
links = [str(tags.get('href')) for tags in soup.find_all('a')]
Upvotes: 1
Reputation: 50550
You are attempting to assign something to a non-existent index. When you create links
, you create it as an empty list.
Then you do links[i]
, but links
is empty, so there is no i
th index.
The proper way to do this is:
links.append(str(tags.get('href')))
This also means that you can eliminate your i
variable. It's not needed.
for tags in soup.find_all('a'):
links.append(str(tags.get('href')))
print links
This will print all 89 links in your links
list.
Upvotes: 4