Reputation: 134
Here is my code
import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.whitehouse.gov/briefings-statements/")
src = result.content
soup = BeautifulSoup(src, 'lxml')
urls = []
for h2_tag in soup.find_all("h2"):
a_tag = h2_tag.find('a')
urls.append(a_tag.attrs not in ['href'])
print(urls)
Here is the error
AttributeError: 'NoneType' object has no attribute 'attrs'
what is wrong with my code
Upvotes: 0
Views: 71
Reputation: 84455
My preference for cleaner code is to put the restriction into the selection of nodes, rather than test later. In your case, you can do this by using css selectors which retrieve h2
that have an a
child. Similar layout to yours:
import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.whitehouse.gov/briefings-statements/")
src = result.content
soup = BeautifulSoup(src, 'lxml')
urls = []
for h2_tag in soup.select('h2:has(a)'):
a_tag = h2_tag.find('a')
urls.append(a_tag['href'])
print(urls)
However, we can be much more concise than the above:
urls = [i['href'] for i in soup.select('h2 > a')]
print(urls)
The above selecting a
elements which are direct children of h2
.
Upvotes: 1
Reputation: 19998
Sometimes h2_tag.find('a')
will return None
. You can fix this problem by using a try
/except
:
import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.whitehouse.gov/briefings-statements/")
src = result.content
soup = BeautifulSoup(src, 'lxml')
urls = []
for h2_tag in soup.find_all("h2"):
try:
a_tag = h2_tag.find('a')
urls.append(a_tag.attrs["href"])
except AttributeError:
continue
print(urls)
Upvotes: 1