Nakkhatra
Nakkhatra

Reputation: 65

Download all files with selenium python

I am trying to download all the images and annotations from this link: https://data.mendeley.com/datasets/pwyyg8zmk5/2

For example, I want to download all the images in the bicycle folder. Then there's a download button for each image file. I tried to do this with selenium this way (Xpath="//a[@aria-label='Download file']"), but it only downloads the first image, how can I download all of them? Is it possible with selenium?

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver= webdriver.Chrome(executable_path="F:\Chrome Driver\chromedriver.exe")

driver.get("https://data.mendeley.com/datasets/pwyyg8zmk5/2")

driver.maximize_window()

driver.implicitly_wait(20)

folder= driver.find_element_by_xpath("//span[@title='Bicycle']")
    
folder.click()

folder= driver.find_element_by_xpath("//span[@title='images']")
folder.click()


driver.implicitly_wait(10)
folder= driver.find_element_by_xpath("//a[@aria-label='Download file']")
folder.click()

Upvotes: 2

Views: 1914

Answers (2)

Joaquin
Joaquin

Reputation: 2101

Try this:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver= webdriver.Chrome()

driver.get("https://data.mendeley.com/datasets/pwyyg8zmk5/2")

driver.maximize_window()

driver.implicitly_wait(20)

folder= driver.find_element_by_xpath("//span[@title='Bicycle']")
    
folder.click()

folder= driver.find_element_by_xpath("//span[@title='images']")
folder.click()


driver.implicitly_wait(10)
d_list = driver.find_elements_by_xpath("//a[@aria-label='Download file']")
for d in d_list:
    d.click()
    d_list.extend([a for a in driver.find_elements_by_xpath("//a[@aria-label='Download file']") if a not in d_list])

First, note that I used find_elements_by_xpath instead of find_element_by_xpath, it returns a list of every html element that matches with //a[@aria-label='Download file'].

That list is incomplete because the page only shows the first 21 results at first, that's why you must add this code in the loop:

d_list.extend([a for a in driver.find_elements_by_xpath("//a[@aria-label='Download file']") if a not in d_list])

It finds new images when you make new clicks to download the images(because it will scroll down the html element).

Upvotes: 2

sametatila
sametatila

Reputation: 56

You can use this:

x=0
while True:
  x+=1
  try:
    folder= driver.find_element_by_xpath("//*[@id="main"]/div[2]/article/section[2]/div[3]/div/div[1]/div/div/div["+str(x)+"]/a/svg")
    folder.click()
  except:
    pass

Upvotes: -1

Related Questions