Wolf
Wolf

Reputation: 134

I am learning BeautifulSoup but I am getting an error

Here is my code

import requests
from bs4 import BeautifulSoup

result = requests.get("https://www.whitehouse.gov/briefings-statements/")
src = result.content
soup = BeautifulSoup(src, 'lxml')

urls = []
for h2_tag in soup.find_all("h2"):
    a_tag = h2_tag.find('a')
    urls.append(a_tag.attrs not in ['href'])

print(urls)

Here is the error

AttributeError: 'NoneType' object has no attribute 'attrs'

what is wrong with my code

Upvotes: 0

Views: 71

Answers (2)

QHarr
QHarr

Reputation: 84455

My preference for cleaner code is to put the restriction into the selection of nodes, rather than test later. In your case, you can do this by using css selectors which retrieve h2 that have an a child. Similar layout to yours:

import requests
from bs4 import BeautifulSoup

result = requests.get("https://www.whitehouse.gov/briefings-statements/")
src = result.content
soup = BeautifulSoup(src, 'lxml')
urls = []

for h2_tag in soup.select('h2:has(a)'):
    a_tag = h2_tag.find('a')
    urls.append(a_tag['href'])

print(urls)

However, we can be much more concise than the above:

urls = [i['href'] for i in soup.select('h2 > a')]
print(urls)

The above selecting a elements which are direct children of h2.

Upvotes: 1

MendelG
MendelG

Reputation: 19998

Sometimes h2_tag.find('a') will return None. You can fix this problem by using a try/except:

import requests
from bs4 import BeautifulSoup

result = requests.get("https://www.whitehouse.gov/briefings-statements/")
src = result.content
soup = BeautifulSoup(src, 'lxml')

urls = []
for h2_tag in soup.find_all("h2"):
    try:
        a_tag = h2_tag.find('a')
        urls.append(a_tag.attrs["href"])
    except AttributeError:
        continue

print(urls)

Upvotes: 1

Related Questions