Reputation: 13062
I'm trying to learn how to get data from a website that loads data through some javascript into a table. For example, the website is here
I used Selenium to get the data from the tables here. Here's the code
browser = webdriver.PhantomJS()
wait = WebDriverWait(browser, 10)
browser.get(url) # using the page linked above
wait.until(EC.presence_of_element_located(
(By.ID, 'fancybox-outer')))
print("Page loaded")
browser.find_element_by_xpath(
'//div[contains(@class, "tabs")]/ul/li[text() = "All"]').click()
data_table = browser.find_element_by_xpath('//div[@class="grid-canvas"]')
for rows in data_table.find_elements_by_xpath(
'//div[contains(@class, "slick-row")]'):
row = rows.text.split('\n')
print(row)
However, it only get the partial data since the data is loaded in the table dynamically as the table is scrolled. How do I get the data from the "All" table while taking care of the scrolling?
There is also a "Export To CSV" Data button at the bottom that I could use to get the data I need, but a click() event on that button isn't giving me the csv data in the code. If possible, getting this csv would be better.
Upvotes: 3
Views: 8557
Reputation: 474131
Let's aim to getting the CSV file. The problem is that PhantomJS
does not deal well with file downloads, see Download file via hyperlink in PhantomJS using Selenium (things might've changed though).
Anyway, let's grab the link to the CSV file and use urlretrieve()
to download the file:
from urllib.parse import urljoin # for Python2: from urlparse import urljoin
from urllib.request import urlretrieve
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://www.draftkings.com/contest/draftteam/22264509"
browser = webdriver.PhantomJS()
wait = WebDriverWait(browser, 10)
browser.get(url)
# wait for page to load
wait.until(EC.presence_of_element_located((By.ID, 'fancybox-outer')))
print("Page loaded")
browser.find_element_by_xpath('//div[contains(@class, "tabs")]/ul/li[text() = "All"]').click()
# download the file
csv_url = urljoin(url, browser.find_element_by_css_selector("a.export-to-csv").get_attribute("href"))
urlretrieve(csv_url, "players.csv")
Upvotes: 2