Csongor
Csongor

Reputation: 593

Cannot get_attribute('href') from element via Selenium

I've been stuck at this for eons now... Can you please help?

Trying to build a scraper that scrapes listings on this website and I just cannot for the life of me get the URL of each listing. Can you please help?

I've tried numerous ways to locate the element, this latest one is by the absolute XPath (by class always failed as well)

The code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
import time

PATH = "/Users/csongordoma/Documents/chromedriver"
driver = webdriver.Chrome(PATH)
driver.get('https://ingatlan.com/lista/elado+lakas+budapest')

data = {}
df = pd.DataFrame(columns=['Price', 'Address', 'Size', 'Rooms', 'URL'])

listings = driver.find_elements_by_css_selector('div.listing__card')
for listing in listings:
    data['Price'] = listing.find_elements_by_css_selector('div.price')[0].text
    data['Address'] = listing.find_elements_by_css_selector('div.listing__address')[0].text
#    data['Size'] = listing.find_elements_by_css_selector('div.listing__parameter listing__data--area-size')[0].text
    data['URL'] = listing.find_elements_by_xpath('/html[1]/body[1]/div[1]/div[2]/div[4]/div[1]/main[1]/div[1]/div[1]/div[1]/a[3]')[0].text
    df = df.append(data, ignore_index=True)

print(len(listings))
print(data)

#   driver.find_element_by_xpath("//a[. = 'Következő oldal']").click()

driver.quit()

The error message:

Traceback (most recent call last):
  File "hello.py", line 18, in <module>
    data['URL'] = listing.find_elements_by_xpath('/html[1]/body[1]/div[1]/div[2]/div[4]/div[1]/main[1]/div[1]/div[1]/div[1]/a[3]')[0].text
IndexError: list index out of range

Many thanks!

Upvotes: 0

Views: 102

Answers (1)

Arundeep Chohan
Arundeep Chohan

Reputation: 9969

Something like the below would work. To get a webelement of a[2] from an element and it's href.

data['URL'] = listing.find_element_by_xpath('//a[2]').get_attribute('href')

Upvotes: 1

Related Questions