Reputation: 41
I'm using Selenium + Python to scrape match results on a Battlefy page for later manipulation and entering into a database. I'm trying to scrape the names of the teams and the results using Selenium because the dynamically loading JS requires me to use a headless browser. However, I'm trying to get the text of each college using the class name, but using Selenium's find_elements_by_class_name
method doesn't seem to be working.
Current code:
>>> chrome_path = r"C:\Users\...\chromedriver.exe"
>>> driver = webdriver.Chrome(chrome_path)
>>> driver.get("https://battlefy.com/college-league-of-legends/2020-north-conference/5de98dd4196d1311d9e6edbd/stage/5e23b6e395e72856dac06997/bracket/1")
>>> team = driver.find_elements_by_class_name("team-name overflow-ellipsis float-right")
>>> for item in teams:
print(item.text)
Which does not print anything and returns an empty array. I must be doing something incorrectly. How can I scrape each team name's text when it's covered by a class name?
Upvotes: 2
Views: 744
Reputation: 14135
team-name overflow-ellipsis float-right
is combination of classes and when you use find_elements_by_class_name
/find_element_by_class_name
method, the locator will be converted to CSS internally but selenium library. Hence you have to mask all the spaces (white spaces) with .
.
Try with below.
team = driver.find_elements_by_class_name("team-name.overflow-ellipsis.float-right")
Edit 1:
Here is the selenium implementation, where we can see the locator is pre-pended with .
and it uses By.CSS_SELECTOR
internally. So, we don't have to add .
for the first class name.
Upvotes: 1
Reputation: 193188
To scrape the names of the teams using Selenium and Python you have to induce WebDriverWait for the visibility_of_all_elements_located()
and you can use either of the following Locator Strategies:
Using CSS_SELECTOR
:
driver.get("https://battlefy.com/college-league-of-legends/2020-north-conference/5de98dd4196d1311d9e6edbd/stage/5e23b6e395e72856dac06997/bracket/1")
print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".team-name.overflow-ellipsis.float-right")))])
Using XPATH
:
driver.get("https://battlefy.com/college-league-of-legends/2020-north-conference/5de98dd4196d1311d9e6edbd/stage/5e23b6e395e72856dac06997/bracket/1")
print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='team-name overflow-ellipsis float-right']")))])
Console Output:
['Cougars', 'University of Illinois at Urbana-Champaign', 'Maryville Esports', 'Michigan State University', 'Purdue University', 'Illinois Wesleyan Titans', 'UMN Varsity Gold', 'UC LoL A Team', 'Arbor Esports', 'CWRU 300 Spartans', 'Bethany Esports', 'BGC at OSU', 'University of Wisconsin', 'CGC UIC', 'Indiana University - Purdue University Indianapolis - High Tempo Gaming', 'Missouri State University', 'KSU Wildcats', 'University of Manitoba Bisons', 'Nebraska', 'S&T eSports', 'Illinois State University - Redbird Esports', 'WUSTL Bears', 'University of Iowa A Team', 'TSUES', 'Division 2+', 'Grizzlies', 'Principia College esports', 'Northwestern Varsity', 'Wright State University - Raiders', 'Milwaukee School of Engineering - Raiders', 'UPIKE Esports', 'UMDads', 'Jayhawk Esports', 'NKU Esports', 'Warriors', 'Spartans', 'ND Lol', 'SDSU Team Alpha', 'Rose-Hulman', 'SIUe eSports', 'UND', 'MTU GOLD', 'Polar Bears', 'Purdue Fort Wayne Esports', 'CSU LOL', 'Aquinas Esports', 'Shawnee State Bears', 'Lewis Flyers', 'NDSU League of Legends Club', 'South Dakota Mines - Hardrockers', 'GVSU Laker Legends', 'G&E Club @ Iowa State University', 'MVC Vikings', 'Match from North (Dukes)']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Upvotes: 0