Moran Reznik
Moran Reznik

Reputation: 1371

can't get all image urls right in python selenium

for a personal project, I am trying to scrape this webpage:

https://www.ebay.com/b/Jordan-11-Retro-Cool-Grey-2001/15709/bn_7117643306

trying to get all img URLs, using Selenium.

here is the code:

url = 'https://www.ebay.com/b/Jordan-11-Retro-Cool-Grey-2001/15709/bn_7117643306'

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

# open url
browser = webdriver.Chrome('/Users/mreznik/V5/chromedriver')
browser.implicitly_wait(2)
browser.get(url)

elems = browser.find_elements_by_tag_name("img")
for elem in elems:
    print(elem.get_attribute('src'))

and it gets me a list of results:

...
https://i.ebayimg.com/thumbs/images/g/M-sAAOSwahdgrd0x/s-l300.webp
https://i.ebayimg.com/thumbs/images/g/bpUAAOSwoa9gtlWw/s-l300.webp
https://ir.ebaystatic.com/cr/v/c1/s_1x2.gif
...

as one can see by running this, these are listings on the page who's URL is not on the list - and stranger yet, images here that are not on the page!

how can I get this right?

Upvotes: 2

Views: 1222

Answers (1)

Prophet
Prophet

Reputation: 33361

You should get only the elements containing products images.
Please try this:

product_img_xpath = '//div[contains(@class,"s-item")]//img'
elems = browser.find_elements_by_xpath(product_img_xpath)
for elem in elems:
    print(elem.get_attribute('src'))

Don't forget some delay / wait before getting the elements list, something like this:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(browser, 20)

product_img_xpath = '//div[contains(@class,"s-item")]//img'
wait.until(EC.visibility_of_element_located((By.XPATH, product_img_xpath)))
time.sleep(1)

imgs = browser.find_elements_by_xpath(product_img_xpath)
for img in imgs:
    print(img.get_attribute('src'))

UPD
In case you still not getting all the elements in the list please try scrolling to the element before accessing it properties.

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

wait = WebDriverWait(browser, 20)
actions = ActionChains(browser)

product_img_xpath = '//div[contains(@class,"s-item")]//img'
wait.until(EC.visibility_of_element_located((By.XPATH, product_img_xpath)))
time.sleep(1)

imgs = browser.find_elements_by_xpath(product_img_xpath)
for img in imgs:
    actions.move_to_element(img).perform()
    print(img.get_attribute('src'))

Upvotes: 2

Related Questions