Kevin Torres Silva
Kevin Torres Silva

Reputation: 13

How to scrape text of a span sibling span?

Hello I'm trying to learn how to web scrape so I started by trying to web scrape my school menu.

Ive come into a problem were I can't get the menu items under a span class but instead get the the word within the same line of the span class "show".

here is a short amount of the html text I am trying to work with

enter image description here

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome(executable_path=chromedriver.exe')#changed this 
driver.get('https://housing.ucdavis.edu/dining/menus/dining-commons/tercero/')
results = []
content = driver.page_source
soups = BeautifulSoup(content, 'html.parser')
element=soups.findAll('span',class_ = 'collapsible-heading-status')
for span in element:
    print(span.text)

I have tried to make it into span.span.text but that wouldn't return me anything so can some one give me some pointer on how to extract the info under the collapsible-heading-status class.

Upvotes: 1

Views: 237

Answers (1)

HedgeHog
HedgeHog

Reputation: 25048

Yummy waffles - As mentioned they are gone, but to get your goal an approach would be to select the names via css selectors using the adjacent sibling combinator:

for e in soup.select('.collapsible-heading-status + span'):
    print(e.text)

or with find_next_sibling():

for e in soup.find_all('span',class_ = 'collapsible-heading-status'):
    print(e.find_next_sibling('span').text)
Example

To get the whole information for each in a structured way you could use:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://housing.ucdavis.edu/dining/menus/dining-commons/tercero/")

soup = BeautifulSoup(driver.page_source, 'html.parser')

data = []
    
for e in soup.select('.nutrition'):
    d = {
        'meal':e.find_previous('h4').text,
        'title':e.find_previous('h5').text,
        'name':e.find_previous('span').text,
        'description': e.p.text
        }
    d.update({n.text:n.find_next().text.strip(': ') for n in e.select('h6')})
    data.append(d)
data
Output
[{'meal': 'Breakfast',
  'title': 'Fresh Inspirations',
  'name': 'Vanilla Chia Seed Pudding with Blueberrries',
  'description': 'Vanilla chia seed pudding with blueberries, shredded coconut, and toasted almonds',
  'Serving Size': '1 serving',
  'Calories': '392.93',
  'Fat (g)': '36.34',
  'Carbohydrates (g)': '17.91',
  'Protein (g)': '4.59',
  'Allergens': 'Tree Nuts/Coconut',
  'Ingredients': 'Coconut milk, chia seeds, beet sugar, imitation vanilla (water, vanillin, caramel color, propylene glycol, ethyl vanillin, potassium sorbate), blueberries, shredded sweetened coconut (desiccated coconut processed with sugar, water, propylene glycol, salt, sodium metabisulfite), blanched slivered almonds'},
 {'meal': 'Breakfast',
  'title': 'Fresh Inspirations',
  'name': 'Housemade Granola',
  'description': 'Crunchy and sweet granola made with mixed nuts and old fashioned rolled oats',
  'Serving Size': '1/2 cup',
  'Calories': '360.18',
  'Fat (g)': '17.33',
  'Carbohydrates (g)': '47.13',
  'Protein (g)': '8.03',
  'Allergens': 'Gluten/Wheat/Dairy/Peanuts/Tree Nuts',
  'Ingredients': 'Old fashioned rolled oats (per manufacturer, may contain wheat/gluten), sunflower seeds, seedless raisins, unsalted butter, pure clover honey, peanut-free mixed nuts (cashews, almonds, sunflower oil and/or cottonseed oil, pecans, hazelnuts, dried Brazil nuts, salt), light brown beet sugar, molasses'},...]

Upvotes: 1

Related Questions