How to scrape a page with BeautifulSoup and Python?

Question

I am trying to extract information from the BBC Good Food website, but I am having some trouble narrowing down the data I'm collecting.

Here's what I have so far:

from bs4 import BeautifulSoup
import requests

webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=tomato')
soup = BeautifulSoup(webpage.content)
links = soup.find_all("a")

for anchor in links:
    print(anchor.get('href')), anchor.text

This returns all the links from the page in question plus a text description of the link, but I want to extract the links from the 'article' type objects on the page. These are the links to the specific recipes.

Through some experimentation I have managed to return the text from the articles, but I can't seem to extract the links.

Padraic Cunningham · Accepted Answer

The only two things I see related to the article tags are the the href and img.src:

from bs4 import BeautifulSoup
import requests

webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=tomato')
soup = BeautifulSoup(webpage.content)
links = soup.find_all("article")

for ele in links:
    print(ele.a["href"])
    print(ele.img["src"])

The links are in "class=node-title"

from bs4 import BeautifulSoup
import requests

webpage = requests.get('http://www.bbcgoodfood.com/search/recipes?query=tomato')
soup = BeautifulSoup(webpage.content)


links = soup.find("div",{"class":"main row grid-padding"}).find_all("h2",{"class":"node-title"})

for l in links:
    print(l.a["href"])

/recipes/681646/tomato-tart
/recipes/4468/stuffed-tomatoes
/recipes/1641/charred-tomatoes
/recipes/tomato-confit
/recipes/1575635/roast-tomatoes
/recipes/2536638/tomato-passata
/recipes/2518/cherry-tomatoes
/recipes/681653/stuffed-tomatoes
/recipes/2852676/tomato-sauce
/recipes/2075/tomato-soup
/recipes/339605/tomato-sauce
/recipes/2130/essence-of-tomatoes-
/recipes/2942/tomato-tarts
/recipes/741638/fried-green-tomatoes-with-ripe-tomato-salsa
/recipes/3509/honey-and-thyme-tomatoes

To access you need to prepend http://www.bbcgoodfood.com:

for l in links:
       print(requests.get("http://www.bbcgoodfood.com{}".format(l.a["href"])).status
200
200
200
200
200
200
200
200
200
200

How to scrape a page with BeautifulSoup and Python?

Answers (2)

Related Questions