Reputation: 57
I'm going a little crazy here. I don't know why I can't retrieve the course details from this page I'm trying to scrape. I am wondering how I can retrieve the course details, specifically time and day information so that I can add it to a dataframe, just as I did with the course titles. The course details are dynamic and only appear when the course title is expanded, this is why I'm using Selenium but I think I'm using it incorrectly. This is my web scraper so far
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from bs4 import BeautifulSoup
PATH = "/Users/chromedriver"
driver = webdriver.Chrome(PATH)
driver.get("https://sa.ucla.edu/ro/Public/SOC/Results?t=19F&sBy=subject&sName=Computer+Science+%28COM+SCI%29&subj=COM+SCI&crsCatlg=Enter+a+Catalog+Number+or+Class+Title+%28Optional%29&catlg=&cls_no=&btnIsInIndex=btn_inIndex")
page_source = driver.page_source
# Click to open all of the class title links
link = driver.find_element_by_link_text("Expand All Classes")
link.click()
# Passing data to beautifulsoup
soup = BeautifulSoup(page_source, 'lxml')
# Retrieving course titles
class_name=[] # List where class names are stored
# create a loop that runs through all <div> elements
for row in soup.find_all('div', attrs={'class':'row-fluid class-title'}):
class_name.append(row.text.strip('\n '))
print(class_name)
The details are stored here:
and the last code I tried was this but I came back with an empty list. I'm not sure if the list method would work here:
# Trying to retrieve course details
class_details=[] # List where class details are stored
# create a loop that runs through all <div> elements
for row in soup.find_all('div', attrs={'class':'row-fluid header-row class-info'}):
class_details.append(row.text.strip('\n '))
print(class_details)
Upvotes: 0
Views: 113
Reputation: 171
You are referencing incorrect class on your course details. here's the script for printing what you needed, course title and time and day. No need for beautifulsoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
PATH = "chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://sa.ucla.edu/ro/Public/SOC/Results?t=19F&sBy=subject&sName=Computer+Science+%28COM+SCI%29&subj=COM+SCI&crsCatlg=Enter+a+Catalog+Number+or+Class+Title+%28Optional%29&catlg=&cls_no=&btnIsInIndex=btn_inIndex")
# Click to open all of the class title links
link = driver.find_element_by_link_text("Expand All Classes")
link.click()
course_name=[] # List where class names are stored
#find all course titles and print list
courseTitleElems = driver.find_elements_by_xpath("//h3[@class='head']")
for courseTitleElem in courseTitleElems:
courseTitle = courseTitleElem.text
course_name.append(courseTitle)
timeElems = courseTitleElem.find_elements_by_xpath("following-sibling::div/div[contains(@id,'children')]//div[contains(@class,'timeColumn')]")
print("\n" + courseTitle)
for timeElem in timeElems:
timeColumn = timeElem.get_attribute("textContent").strip("\n")
print("\n" + timeColumn)
#print list
print(course_name)
Upvotes: 1