Reputation: 41
I am trying to learn Python to scrape a websites lunch menu using beautifulsoup. I have made the request
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
And the response looks like this:
<div class="lunchRow">
<div class="lunchRowDay"><h3>Monday</h3></div>
<div class="lunchRowItem"><div class="lunchRowItemActual">Meatballs</div>
<div class="lunchRowItemActual">Soup</div>
</div>
</div>
<div class="lunchRow">
<div class="lunchRowDay"><h3>Tuesday</h3></div>
<div class="lunchRowItem"><div class="lunchRowItemActual">Chicken</div>
<div class="lunchRowItemActual">Pork</div>
<div class="lunchRowItemActual">Fish</div>
</div>
</div>
What is the easiest way to get the lunchRowItemActual
for each day? I started by searching for the day and get the next div but after that I am lost and I assume this is not the way to solve it.
soup = soup.find(string="Monday").find_next('div').contents[0].text
Upvotes: 3
Views: 129
Reputation: 358
First find all elements with lunchRow class Iterate through them to get each lunchRow and in that row find the lunchRowDay for the day.
Then findall the lunchRowItemActual within that lunchRow
# Find all the lunchRow divs
lunch_rows = soup.find_all('div', class_='lunchRow')
# Iterate through each lunchRow div
for row in lunch_rows:
day = row.find('div', class_='lunchRowDay').text.strip()
items = [item.text.strip() for item in row.find_all('div', class_='lunchRowItemActual')]
print(f"{day}: {', '.join(items)}")
Monday: Meatballs, Soup
Tuesday: Chicken, Pork, Fish
Upvotes: 0
Reputation: 599
soup.select
is a great way to do things like this.
Then use get_text
to... get the text.
And some list comprehension will apply get_text
to the whole list
days = soup.select("div.lunchRowDay")
for day in days:
print(day.get_text())
items = [item.get_text() for item in day.select("div.lunchRowItemActual")]
print(items)
Upvotes: 1
Reputation: 922
First off you should try to get all lunchRow divs by their classname and save them to a variable like so:
rows = soup.findAll('div', attrs={'class': 'lunchRow'})
Then we can loop over them and get the individual days and items as follows. Here we get the first/only lunchRowDay item and then look for all lunchRowItemActual elements inside our current row:
for row in rows:
print(row.find('div', attrs={'class': 'lunchRowDay'}).text)
actuals = row.findAll('div', attrs={'class': 'lunchRowItemActual'})
for actual in actuals:
print(actual.text)
Output of this is:
Monday
Meatballs
Soup
Tuesday
Chicken
Pork
Fish
Instead of printing them out you most likely want to put them in a dict using the lunchRowDay as the key and then putting the lunchRowItemActual values into a list but that is up to you.
Upvotes: 3