Reputation: 563
I am scraping a page using requests and bs4. I am trying to scrape links in the page but it still scrapes only the page I am originally on.
Specifically, I am scraping the page at https://untappd.com/v/beer-culture/893427
to get the beer names from that page. The menu section has a dropdown that links to different menus which also have the same page structure. I have been able to extract the links to the linked menu pages. See: print(menu_urls)
in script. I have tried to iterate through the list of links creating a new soup
for each and scraping it, but it only scrapes the original page n times where n is the length of the list of urls. So in my case, instead of scraping this list:
['https://untappd.com/v/beer-culture/893427?menu_id=1489', 'https://untappd.com/v/beer-culture/893427?menu_id=116472']
it only scrapes the original
https://untappd.com/v/beer-culture/893427
twice.
Here is my script:
import requests
from bs4 import BeautifulSoup
venue_url = 'https://untappd.com/v/beer-culture/893427'
count = 0
response = requests.get(venue_url, headers = {'User-agent': 'Mozilla/5.0'})
soup = BeautifulSoup(response.text, 'html.parser')
def get_menu_beers(soup):
global count
menu = soup.find('div', {'class': 'menu-area'})
beers_all = menu.find_all('ul', {'class': 'menu-section-list'})
for beer_group in beers_all:
beers = beer_group.find_all('li')
for beer in beers:
details = beer.find('div', {'class': 'beer-details'})
name_ = details.find("a",{"class":"track-click"}).text
count = count + 1
print(count, ' ', name_)
select_options = soup.find_all('select', {'class':'menu-selector'})
options_list = select_options[0].find_all('option')
menu_ids =[]
for option in options_list:
menu_ids.append(int(option['value']))
menu_urls = []
for menu_id in menu_ids:
menu_url = str(venue_url)+ '?menu_id=' + str(menu_id)
menu_urls.append(menu_url)
print(menu_urls)
for url in menu_urls:
res = requests.get(venue_url, headers = {'User-agent': 'Mozilla/5.0'})
s = BeautifulSoup(res.text, 'html.parser')
get_menu_beers(s)
Upvotes: 0
Views: 44
Reputation: 1
It looks like the issue is that you're always making requests to the original venue_url
instead of using the correct url from the menu_urls list. In your last loop, you're still passing venue_url
to requests.get()
, so it's fetching the same page multiple times instead of the linked menu pages.
Replace this line inside your loop:
res = requests.get(venue_url, headers={'User-agent': 'Mozilla/5.0'})
with:
res = requests.get(url, headers={'User-agent': 'Mozilla/5.0'})
Upvotes: 0
Reputation: 2010
In your last few lines of code, you should pass the url
from the menus instead of venue_url
:
for url in menu_urls:
#### pass in url not venue_url ####
res = requests.get(url, headers = {'User-agent': 'Mozilla/5.0'})
s = BeautifulSoup(res.text, 'html.parser')
get_menu_beers(s)
Upvotes: 1