Reputation: 425
I had a working piece of code, and then I run it today and it's broken. I have pulled out the relevant section that is giving me problems.
from bs4 import BeautifulSoup
import requests
webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=')
soup = BeautifulSoup(webpage.content)
links = soup.find("div",{"class":"main row grid-padding"}).find_all("h2",{"class":"node-title"})
for link in links:
print(link.a["href"])
This gives me an error "Attribute Error: 'NoneType' object has no attribute 'find_all'"
What precisely is this error telling me?
find_all() is a valid command in the beautiful soup documentation. Looking through the webpage's source code, my path to my desired object seems to make sense.
I think something must have changed with the website, because I don't see how my code could just stop working. But I don't understand the error message that well...
Thanks for any help you can give!
Upvotes: 0
Views: 1615
Reputation: 20553
This is because when you tried to access the page, it gives you permission denied
, so the soup.find()
returns nothing None
, and None
has no attribute of find_all()
, this gives you an AttributeError
.
from bs4 import BeautifulSoup
import requests
webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=')
print webpage.content
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
You don't have permission to access "http://www.bbcgoodfood.com/search/recipes?" on this server.<P>
Reference #18.4fa9cd17.1428789762.680369dc
</BODY>
</HTML>
If you resolve this by adding a header with proper user agent like @Vader suggested, your code will then run fine:
...
headers = {'User-agent': 'Mozilla/5.0'}
webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=', headers=headers)
soup = BeautifulSoup(webpage.content)
links = soup.find("div",{"class":"main row grid-padding"}).find_all("h2",{"class":"node-title"})
for link in links:
print(link.a["href"])
/recipes/4942/lemon-drizzle-cake
/recipes/3092/ultimate-chocolate-cake
/recipes/3228/chilli-con-carne
/recipes/3229/yummy-scrummy-carrot-cake
/recipes/1223/bestever-brownies
/recipes/1167651/chicken-and-chorizo-jambalaya
/recipes/2089/spiced-carrot-and-lentil-soup
/recipes/1521/summerinwinter-chicken
/recipes/1364/spicy-root-and-lentil-casserole
/recipes/4814/mustardstuffed-chicken
/recipes/4622/classic-scones-with-jam-and-clotted-cream
/recipes/333614/red-lentil-chickpea-and-chilli-soup
/recipes/5605/falafel-burgers
/recipes/11695/raspberry-bakewell-cake
/recipes/4686/chicken-biryani
Upvotes: 1
Reputation: 3873
The site you are trying to parse doesn't "like" your user agent and returns 403 error,then parser fails since it cannot find the div
. Try to change user-agent to an user-agent of one of the browsers:
webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=', headers = {'user-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'})
Upvotes: 2