Reputation: 799
I am practicing building web scrapers. One that I am working on now involves going to a site, scraping links for the various cities on that site, then taking all of the links for each of the cities and scraping all the links for the properties in said cites.
I'm using the following code:
import requests
from bs4 import BeautifulSoup
main_url = "http://www.chapter-living.com/"
# Getting individual cities url
re = requests.get(main_url)
soup = BeautifulSoup(re.text, "html.parser")
city_tags = soup.find_all('a', class_="nav-title") # Bottom page not loaded dynamycally
cities_links = [main_url + tag["href"] for tag in city_tags.find_all("a")] # Links to cities
If I print out city_tags
I get the HTML I want. However, when I print cities_links
I get AttributeError: 'ResultSet' object has no attribute 'find_all'
.
I gather from other q's on here that this error occurs because city_tags
returns none, but this can't be the case if it is printing out the desired html? I have noticed that said html is in [] - does this make a difference?
Upvotes: 5
Views: 8025
Reputation: 2698
Well city_tags is a bs4.element.ResultSet
(essentially a list) of tags and you are calling find_all on it. You probably want to call find_all in every element of the resultset or in this specific case just retrieve their href attribute
import requests
from bs4 import BeautifulSoup
main_url = "http://www.chapter-living.com/"
# Getting individual cities url
re = requests.get(main_url)
soup = BeautifulSoup(re.text, "html.parser")
city_tags = soup.find_all('a', class_="nav-title") # Bottom page not loaded dynamycally
cities_links = [main_url + tag["href"] for tag in city_tags] # Links to cities
Upvotes: 5
Reputation: 215117
As the error says, the city_tags is a ResultSet which is a list of nodes and it doesn't have the find_all
method, you either have to loop through the set and apply find_all
on each individual node or in your case, I think you can simply extract the href
attribute from each node:
[tag['href'] for tag in city_tags]
#['https://www.chapter-living.com/blog/',
# 'https://www.chapter-living.com/testimonials/',
# 'https://www.chapter-living.com/events/']
Upvotes: 3