Scraping multiple pages with beautifulsoup4 using python 3.6.3

Question

I am trying to loop through multiple pages and my code doesn't extract anything. I am kind of new to scraping so bear with me. I made a container so I can target each listing. I also made a variable to target the anchor tag that you would press to go to the next page. I would really appreciate any help I could get. Thanks.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

for page in range(0,25):
    file = "breakfeast_chicago.csv"
    f = open(file, "w")
    Headers = "Nambusiness_name, business_address, business_city, business_region, business_phone_number
"
f.write(Headers)

my_url = 'https://www.yellowpages.com/search?search_terms=Stores&geo_location_terms=Chicago%2C%20IL&page={}'.format(page)

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()   

# html parsing
page_soup = soup(page_html, "html.parser")

# grabs each listing
containers = page_soup.findAll("div",{"class": "result"})

new = page_soup.findAll("a", {"class":"next ajax-page"})

for i in new:
    try:
        for container in containers:
            b_name = i.find("container.h2.span.text").get_text()
            b_addr = i.find("container.p.span.text").get_text()

            city_container = container.findAll("span",{"class": "locality"})
            b_city = i.find("city_container[0].text ").get_text()

            region_container = container.findAll("span",{"itemprop": "postalCode"})
            b_reg = i.find("region_container[0].text").get_text()

            phone_container = container.findAll("div",{"itemprop": "telephone"})
            b_phone = i.find("phone_container[0].text").get_text()

            print(b_name, b_addr, b_city, b_reg, b_phone)
            f.write(b_name + "," +b_addr + "," +b_city.replace(",", "|") + "," +b_reg + "," +b_phone + "
")
    except: AttributeError
f.close()

SIM · Accepted Answer

You can try like the below script. It will traverse different pages through pagination and collect name and phone numbers from each container.

import requests
from bs4 import BeautifulSoup

my_url = "https://www.yellowpages.com/search?search_terms=Stores&geo_location_terms=Chicago%2C%20IL&page={}"
for link in [my_url.format(page) for page in range(1,5)]:
    res = requests.get(link)
    soup = BeautifulSoup(res.text, "lxml")

    for item in soup.select(".info"):
        try:
            name = item.select(".business-name [itemprop='name']")[0].text
        except Exception:
            name = ""
        try:
            phone = item.select("[itemprop='telephone']")[0].text
        except Exception:
            phone = ""

        print(name,phone)

Scraping multiple pages with beautifulsoup4 using python 3.6.3

Answers (2)

Related Questions