Reputation: 65
I am trying to download all the images and annotations from this link: https://data.mendeley.com/datasets/pwyyg8zmk5/2
For example, I want to download all the images in the bicycle folder. Then there's a download button for each image file. I tried to do this with selenium this way (Xpath="//a[@aria-label='Download file']"), but it only downloads the first image, how can I download all of them? Is it possible with selenium?
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver= webdriver.Chrome(executable_path="F:\Chrome Driver\chromedriver.exe")
driver.get("https://data.mendeley.com/datasets/pwyyg8zmk5/2")
driver.maximize_window()
driver.implicitly_wait(20)
folder= driver.find_element_by_xpath("//span[@title='Bicycle']")
folder.click()
folder= driver.find_element_by_xpath("//span[@title='images']")
folder.click()
driver.implicitly_wait(10)
folder= driver.find_element_by_xpath("//a[@aria-label='Download file']")
folder.click()
Upvotes: 2
Views: 1914
Reputation: 2101
Try this:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver= webdriver.Chrome()
driver.get("https://data.mendeley.com/datasets/pwyyg8zmk5/2")
driver.maximize_window()
driver.implicitly_wait(20)
folder= driver.find_element_by_xpath("//span[@title='Bicycle']")
folder.click()
folder= driver.find_element_by_xpath("//span[@title='images']")
folder.click()
driver.implicitly_wait(10)
d_list = driver.find_elements_by_xpath("//a[@aria-label='Download file']")
for d in d_list:
d.click()
d_list.extend([a for a in driver.find_elements_by_xpath("//a[@aria-label='Download file']") if a not in d_list])
First, note that I used find_elements_by_xpath instead of find_element_by_xpath, it returns a list of every html element that matches with //a[@aria-label='Download file']
.
That list is incomplete because the page only shows the first 21 results at first, that's why you must add this code in the loop:
d_list.extend([a for a in driver.find_elements_by_xpath("//a[@aria-label='Download file']") if a not in d_list])
It finds new images when you make new clicks to download the images(because it will scroll down the html element).
Upvotes: 2
Reputation: 56
You can use this:
x=0
while True:
x+=1
try:
folder= driver.find_element_by_xpath("//*[@id="main"]/div[2]/article/section[2]/div[3]/div/div[1]/div/div/div["+str(x)+"]/a/svg")
folder.click()
except:
pass
Upvotes: -1