Mitzi
Mitzi

Reputation: 57

How can I retrieve this dynamic object using Selenium and beautifulsoup for a web scraper?

I'm going a little crazy here. I don't know why I can't retrieve the course details from this page I'm trying to scrape. I am wondering how I can retrieve the course details, specifically time and day information so that I can add it to a dataframe, just as I did with the course titles. The course details are dynamic and only appear when the course title is expanded, this is why I'm using Selenium but I think I'm using it incorrectly. This is my web scraper so far

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from bs4 import BeautifulSoup

PATH = "/Users/chromedriver"

driver = webdriver.Chrome(PATH)
driver.get("https://sa.ucla.edu/ro/Public/SOC/Results?t=19F&sBy=subject&sName=Computer+Science+%28COM+SCI%29&subj=COM+SCI&crsCatlg=Enter+a+Catalog+Number+or+Class+Title+%28Optional%29&catlg=&cls_no=&btnIsInIndex=btn_inIndex")
page_source = driver.page_source


# Click to open all of the class title links 
link = driver.find_element_by_link_text("Expand All Classes")
link.click()


# Passing data to beautifulsoup 
soup = BeautifulSoup(page_source, 'lxml')

# Retrieving course titles
class_name=[] # List where class names are stored
# create a loop that runs through all <div> elements
for row in soup.find_all('div', attrs={'class':'row-fluid class-title'}):
    class_name.append(row.text.strip('\n '))
print(class_name)

The details are stored here:

enter image description here

and the last code I tried was this but I came back with an empty list. I'm not sure if the list method would work here:

 # Trying to retrieve course details
    class_details=[] # List where class details are stored
    # create a loop that runs through all <div> elements
    for row in soup.find_all('div', attrs={'class':'row-fluid header-row class-info'}):
        class_details.append(row.text.strip('\n '))
    print(class_details)

Upvotes: 0

Views: 113

Answers (1)

Millie Anne Volante
Millie Anne Volante

Reputation: 171

You are referencing incorrect class on your course details. here's the script for printing what you needed, course title and time and day. No need for beautifulsoup

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

PATH = "chromedriver.exe"

driver = webdriver.Chrome(PATH)
driver.get("https://sa.ucla.edu/ro/Public/SOC/Results?t=19F&sBy=subject&sName=Computer+Science+%28COM+SCI%29&subj=COM+SCI&crsCatlg=Enter+a+Catalog+Number+or+Class+Title+%28Optional%29&catlg=&cls_no=&btnIsInIndex=btn_inIndex")

# Click to open all of the class title links 
link = driver.find_element_by_link_text("Expand All Classes")
link.click()

course_name=[] # List where class names are stored
#find all course titles and print list
courseTitleElems = driver.find_elements_by_xpath("//h3[@class='head']")
for courseTitleElem in courseTitleElems:
    courseTitle = courseTitleElem.text
    course_name.append(courseTitle)
    timeElems = courseTitleElem.find_elements_by_xpath("following-sibling::div/div[contains(@id,'children')]//div[contains(@class,'timeColumn')]")
    print("\n" + courseTitle)
    for timeElem in timeElems:
        timeColumn = timeElem.get_attribute("textContent").strip("\n")
        print("\n" + timeColumn)

#print list
print(course_name)

and here's print result enter image description here

Upvotes: 1

Related Questions