Reputation: 1
Guys I have a problem with efficiency in my code
I need to extract certain data from web page with land properties / apartments and later on alanyse them, but my code runs extremely slow, would You be so kind and help me?
PS I am new one to web scraping
driver.get('https://www.olx.pl/nieruchomosci/dzialki')
innerLayout = driver.find_element_by_id('innerLayout')
print(innerLayout)
container = innerLayout.find_element_by_id('body-container')
offer_wrap = container.find_elements_by_class_name("offer-wrapper")
for i in offer_wrap:
link = driver.find_element_by_xpath('//*[@id="body-container"]/div[3]/div/div[1]/table[1]/tbody/tr[3]/td/div/table/tbody/tr[1]/td[2]/div/h3/a')
link.click()
outerClass = driver.find_element_by_id('offerdescription')
time.sleep(10)
#price of field
parcel = outerClass.find_elements_by_xpath('//*[@id="offerdescription"]/div[2]/ul/li[3]/span/strong')
price= []
for i in parcel:
price.append(i.text)
time.sleep(10)
#surface
surface = outerClass.find_elements_by_xpath('//*[@id="offerdescription"]/div[2]/ul/li[4]/span/strong')
surf = []
for j in surface:
surf.append(j.text)
time.sleep(10)
driver.back()
print(price)
print(surf)
Upvotes: 0
Views: 44
Reputation: 5396
Please avoid time.sleep()
. This is kind of static wait which will still wait even if your element is visible and can do interaction.
Based on your code, I could not find better situation like why you are using 10 seconds sleep at some point.
This is one of example where you can replace your time.sleep wtih explicit wait :
element = WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.XPATH, "Your element Xpath here"))
)
Also your most of xpaths are absolute, Please use relative xpaths which will make your script more stable.
I have some good xpath for you here :
Your xpath : //*[@id="body-container"]/div[3]/div/div[1]/table[1]/tbody/tr[3]/td/div/table/tbody/tr[1]/td[2]/div/h3/a
Better xpath : (//table[@summary='Ogłoszenie']//tr//td//h3/a)[1]
Your xpath : //*[@id="offerdescription"]/div[2]/ul/li[3]/span/strong
Better xpath : (//span[@class='offer-details__name'])[3]
Your xpath : //*[@id="offerdescription"]/div[2]/ul/li[4]/span/strong
Better xpath : (//span[contains(@class,'name')])[4]
Please remember that optimization of xpath may not affect much on execution speed of your script but it will definitely make script stable.
Upvotes: 1