Web scraping with Python and Beautiful Soup

Question

I am practicing building web scrapers. One that I am working on now involves going to a site, scraping links for the various cities on that site, then taking all of the links for each of the cities and scraping all the links for the properties in said cites.

I'm using the following code:

import requests

from bs4 import BeautifulSoup

main_url = "http://www.chapter-living.com/"

# Getting individual cities url
re = requests.get(main_url)
soup = BeautifulSoup(re.text, "html.parser")
city_tags = soup.find_all('a', class_="nav-title")  # Bottom page not loaded dynamycally
cities_links = [main_url + tag["href"] for tag in city_tags.find_all("a")]  # Links to cities

If I print out city_tags I get the HTML I want. However, when I print cities_links I get AttributeError: 'ResultSet' object has no attribute 'find_all'.

I gather from other q's on here that this error occurs because city_tags returns none, but this can't be the case if it is printing out the desired html? I have noticed that said html is in [] - does this make a difference?

akuiper · Accepted Answer

As the error says, the city_tags is a ResultSet which is a list of nodes and it doesn't have the find_all method, you either have to loop through the set and apply find_all on each individual node or in your case, I think you can simply extract the href attribute from each node:

[tag['href'] for tag in city_tags]

#['https://www.chapter-living.com/blog/',
# 'https://www.chapter-living.com/testimonials/',
# 'https://www.chapter-living.com/events/']

Web scraping with Python and Beautiful Soup

Answers (2)

Related Questions