Jan Maniawski
Jan Maniawski

Reputation: 1

Problems with Selenium, especially with optimization

Guys I have a problem with efficiency in my code

I need to extract certain data from web page with land properties / apartments and later on alanyse them, but my code runs extremely slow, would You be so kind and help me?

PS I am new one to web scraping

driver.get('https://www.olx.pl/nieruchomosci/dzialki')


innerLayout = driver.find_element_by_id('innerLayout')
print(innerLayout)
container = innerLayout.find_element_by_id('body-container')
offer_wrap = container.find_elements_by_class_name("offer-wrapper")

for i in offer_wrap:
    link = driver.find_element_by_xpath('//*[@id="body-container"]/div[3]/div/div[1]/table[1]/tbody/tr[3]/td/div/table/tbody/tr[1]/td[2]/div/h3/a')
    link.click()

    outerClass = driver.find_element_by_id('offerdescription')

    time.sleep(10)
#price of field

    parcel = outerClass.find_elements_by_xpath('//*[@id="offerdescription"]/div[2]/ul/li[3]/span/strong')

    price= []

    for i in parcel:
        price.append(i.text)
    time.sleep(10)

#surface

    surface = outerClass.find_elements_by_xpath('//*[@id="offerdescription"]/div[2]/ul/li[4]/span/strong')

    surf = []

    for j in surface:
        surf.append(j.text)

    time.sleep(10)
    driver.back()





print(price)
print(surf)

Upvotes: 0

Views: 44

Answers (2)

Helping Hands
Helping Hands

Reputation: 5396

Please avoid time.sleep(). This is kind of static wait which will still wait even if your element is visible and can do interaction.

Based on your code, I could not find better situation like why you are using 10 seconds sleep at some point.

This is one of example where you can replace your time.sleep wtih explicit wait :

 element = WebDriverWait(driver, 5).until(
        EC.presence_of_element_located((By.XPATH, "Your element Xpath here"))
    )

Also your most of xpaths are absolute, Please use relative xpaths which will make your script more stable.

I have some good xpath for you here :

Your xpath : //*[@id="body-container"]/div[3]/div/div[1]/table[1]/tbody/tr[3]/td/div/table/tbody/tr[1]/td[2]/div/h3/a
Better xpath : (//table[@summary='Ogłoszenie']//tr//td//h3/a)[1]


Your xpath : //*[@id="offerdescription"]/div[2]/ul/li[3]/span/strong
Better xpath : (//span[@class='offer-details__name'])[3]


Your xpath : //*[@id="offerdescription"]/div[2]/ul/li[4]/span/strong
Better xpath : (//span[contains(@class,'name')])[4]

Please remember that optimization of xpath may not affect much on execution speed of your script but it will definitely make script stable.

Upvotes: 1

MetraDZ
MetraDZ

Reputation: 39

I understand your usage of time.sleep(), but you better avoid it. Try using WebDriverWait. You can find it here

Upvotes: 0

Related Questions