Reputation: 1
Im trying to grab information from the NYSE, specifically the class "flex_tr" with the html path of https://www.nyse.com/quote/XNGS:AAPL
html->body->div->div.sticky-header__main->div.landing-section->div.idc-container->div->div->div.row->div.col-lg-12.col-md-12->div.d-widget.d-vbox.d-flex1.DataTable-nyse->div.d-container.d-flex1.d-vbox.d-nowrap.d-justify-start.data-table-container.d-noscroll->div.d-flex1->div.d-vbox->div.d-flex-1.d-scroll-y->div.contentContainer->div.flex_tr
There are a ton of rows that this should be grabbing but i'm unable to get the contents of any currently. I've tried soup.find_all("div", class_="flex_tr")
and also soup.find_all("div", {"class": "flex_tr"})
and nothing seems to be able to grab the information.
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome("C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe")
driver.get('https://www.nyse.com/quote/XNGS:AAPL')
content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
flex_tr = soup.find_all(class_="flex_tr")
print(flex_tr)
driver.close()
Upvotes: 0
Views: 440
Reputation: 199
It looks like you're closing the driver before the element is loaded to the page (the return value is an empty array).
Selenium includes a few modules that allow you to wait for an element to load. This question here talks more about this: Wait until page is loaded with Selenium WebDriver for Python
As for your question, I was able to get this working with the following (this mirrors the top answer in the aforementioned link):
from selenium import webdriver
from bs4 import BeautifulSoup
# from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
driver = webdriver.Chrome('./chromedriver')
driver.get('https://www.nyse.com/quote/XNGS:AAPL')
delay = 5 # of seconds for the WebDriverWait param
try:
element = WebDriverWait(driver, delay).until(expected_conditions.presence_of_element_located((By.CLASS_NAME, "flex_tr")))
content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
flex_tr = soup.find_all(class_="flex_tr")
print(flex_tr)
except:
print ("Timeout")
driver.close()
Upvotes: 1