Reputation: 2389
I am trying to parse a website for all links that have the attribute nofollow
.
I want to print that list, one link by one.
However I failed to append the results of findall()
to my list box
(my attempt is in brackets).
What did I do wrong?
import sys
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen(sys.argv[1]).read()
soup = BeautifulSoup(page)
soup.prettify()
box = []
for anchor in soup.findAll('a', href=True, attrs = {'rel' : 'nofollow'}):
# box.extend(anchor['href'])
print anchor['href']
# print box
Upvotes: 0
Views: 1518
Reputation: 1122182
You are looping over soup.findAll
so each anchor
is not itself a list; use .append()
for individual elements:
box.append(anchor['href'])
You could also use a list comprehension to grab all href
attributes:
box = [a['href'] for a in soup.findAll('a', href=True, attrs = {'rel' : 'nofollow'})]
Upvotes: 1