sfactor
sfactor

Reputation: 13062

How to download csv data from a website using Selenium

I'm trying to learn how to get data from a website that loads data through some javascript into a table. For example, the website is here

I used Selenium to get the data from the tables here. Here's the code

browser = webdriver.PhantomJS()
wait = WebDriverWait(browser, 10)
browser.get(url)     # using the page linked above

wait.until(EC.presence_of_element_located(
                (By.ID, 'fancybox-outer')))

print("Page loaded")
browser.find_element_by_xpath(
        '//div[contains(@class, "tabs")]/ul/li[text() = "All"]').click()


data_table = browser.find_element_by_xpath('//div[@class="grid-canvas"]')

for rows in data_table.find_elements_by_xpath(
            '//div[contains(@class, "slick-row")]'):
    row = rows.text.split('\n')
    print(row)

However, it only get the partial data since the data is loaded in the table dynamically as the table is scrolled. How do I get the data from the "All" table while taking care of the scrolling?

There is also a "Export To CSV" Data button at the bottom that I could use to get the data I need, but a click() event on that button isn't giving me the csv data in the code. If possible, getting this csv would be better.

Upvotes: 3

Views: 8557

Answers (1)

alecxe
alecxe

Reputation: 474131

Let's aim to getting the CSV file. The problem is that PhantomJS does not deal well with file downloads, see Download file via hyperlink in PhantomJS using Selenium (things might've changed though).

Anyway, let's grab the link to the CSV file and use urlretrieve() to download the file:

from urllib.parse import urljoin  # for Python2: from urlparse import urljoin
from urllib.request import urlretrieve

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


url = "https://www.draftkings.com/contest/draftteam/22264509"
browser = webdriver.PhantomJS()
wait = WebDriverWait(browser, 10)
browser.get(url)

# wait for page to load
wait.until(EC.presence_of_element_located((By.ID, 'fancybox-outer')))
print("Page loaded")

browser.find_element_by_xpath('//div[contains(@class, "tabs")]/ul/li[text() = "All"]').click()

# download the file
csv_url = urljoin(url, browser.find_element_by_css_selector("a.export-to-csv").get_attribute("href"))
urlretrieve(csv_url, "players.csv")

Upvotes: 2

Related Questions