Shawn Schreier
Shawn Schreier

Reputation: 904

Search for class element underneath another element

I scrape daily lineups, and need to find out if a team does not have it's lineup posted. In this case, there is a class element called lineup__no. I'd like to look at each team and check if there lineup is posted, and if not, add that teams index to a list. For example, if there are 4 teams playing, and the first and third teams do not have a lineup posted, I want to return a list of [0,2]. I am guessing a list comprehension of some sort may help me get there, but struggling to come up with what I need. I tried a for loop for now to get each of the items under the main header. I've also tried adding each li item's text to a list and searching for "Unknown Lineup" but was unsuccessful.

from selenium import webdriver

from selenium.common.exceptions import NoSuchElementException

from bs4 import BeautifulSoup
import requests
import pandas as pd

#Scraping lineups for updates
url = 'https://www.rotowire.com/baseball/daily-lineups.php'

##Requests rotowire HTML
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

games = soup.select('.lineup.is-mlb')
for game in games:
    initial_list = game.find_all('li')
    print(initial_list)

Upvotes: 1

Views: 74

Answers (2)

chitown88
chitown88

Reputation: 28620

Simply just look under the <li> tags with class="lineup__status". And then use enumerate to track the index of the list as you iterate through. I don't have an example of some teams having a lineup in (I'll have to check later as the lineups get populated here), so I'd likely change the logic of if lineupStatus.text.strip() == 'Unknown Lineup' to be more robust. But until I can see exactly how the html looks at that point, I'll have to assume `"lineup__no" class is present always. But like I said, once I see how this page looks with some lineups in, I'll adjust it.

By the way,

The Guardians lineup has not been posted yet.

threw me off there for a second...totally forgot about that!

from bs4 import BeautifulSoup
import requests
import re

#Scraping lineups for updates
url = 'https://www.rotowire.com/baseball/daily-lineups.php'

##Requests rotowire HTML
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

lineupStatuses = soup.find_all('li', {'class':re.compile('^lineup__status')})


noLineupIndex = []
for idx, lineupStatus in enumerate(lineupStatuses):
    if 'is-confirmed' not in lineupStatus['class']:
        noLineupIndex.append(idx)
        
# Or use list comprehension        
#noLineupIndex = [idx for idx, lineupStatus in enumerate(lineupStatuses) if 'is-confirmed' not in lineupStatus['class']] 

Output:

print(noLineupIndex)
[0, 3, 6, 7, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]

Upvotes: 1

Prophet
Prophet

Reputation: 33361

Since I'm more familiar with Selenium I'll give you Selenium solution.
Please see my explanations inside the code given as comments.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

driver = webdriver.Chrome()
driver.maximize_window()
wait = WebDriverWait(driver, 20)
driver.get("https://www.rotowire.com/baseball/daily-lineups.php")
#wait for at least 1 game element to be visible
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".lineup.is-mlb")))
#add a short delay so that all the other games are loaded
time.sleep(0.5)
#get all the games blocks
games = driver.find_elements(By.CSS_SELECTOR,".lineup.is-mlb")
#iterate over the games elements with their indexes in a list comprehension
no_lineup = [j for idx, game in enumerate(games) for j in [idx*2, idx*2+1] if game.find_elements(By.XPATH, ".//li[@class='lineup__no']")] 


#print the collected results
print(no_lineup)
#quit the driver
driver.quit()

Upvotes: 1

Related Questions