Reputation: 593
I've been stuck at this for eons now... Can you please help?
Trying to build a scraper that scrapes listings on this website and I just cannot for the life of me get the URL of each listing. Can you please help?
I've tried numerous ways to locate the element, this latest one is by the absolute XPath (by class always failed as well)
The code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
import time
PATH = "/Users/csongordoma/Documents/chromedriver"
driver = webdriver.Chrome(PATH)
driver.get('https://ingatlan.com/lista/elado+lakas+budapest')
data = {}
df = pd.DataFrame(columns=['Price', 'Address', 'Size', 'Rooms', 'URL'])
listings = driver.find_elements_by_css_selector('div.listing__card')
for listing in listings:
data['Price'] = listing.find_elements_by_css_selector('div.price')[0].text
data['Address'] = listing.find_elements_by_css_selector('div.listing__address')[0].text
# data['Size'] = listing.find_elements_by_css_selector('div.listing__parameter listing__data--area-size')[0].text
data['URL'] = listing.find_elements_by_xpath('/html[1]/body[1]/div[1]/div[2]/div[4]/div[1]/main[1]/div[1]/div[1]/div[1]/a[3]')[0].text
df = df.append(data, ignore_index=True)
print(len(listings))
print(data)
# driver.find_element_by_xpath("//a[. = 'Következő oldal']").click()
driver.quit()
The error message:
Traceback (most recent call last):
File "hello.py", line 18, in <module>
data['URL'] = listing.find_elements_by_xpath('/html[1]/body[1]/div[1]/div[2]/div[4]/div[1]/main[1]/div[1]/div[1]/div[1]/a[3]')[0].text
IndexError: list index out of range
Many thanks!
Upvotes: 0
Views: 102
Reputation: 9969
Something like the below would work. To get a webelement of a[2] from an element and it's href.
data['URL'] = listing.find_element_by_xpath('//a[2]').get_attribute('href')
Upvotes: 1