Cannot get a specific href out of requests

Question

I'm trying to capture a unique url using Pythons Requests

Source website is https://www.realestate.com.au/property/1-10-grosvenor-rd-terrigal-nsw-2260

Goal Url is http://www.realestate.com.au/sold/property-unit-nsw-terrigal-124570934

When i tried

(Unique_ID,) = (x.text_content() for x in tree.xpath('//a[@class="property-
value__link--muted rui-button-brand property-value__btn-listing"]'))

The CSV returned View Listing

Unless im mistaken, i've done the correct class search, as the href would not be unique enough? Am i supposed to do something different to capture URL's instead of text?

Full code below if required.

Thanks in advance.

import requests
import csv
import datetime
import pandas as pd
import csv
from lxml import html

df = pd.read_excel("C:\Python27\Projects\REA_UNIQUE_ID\UN.xlsx",             sheetname="UN")
dnc = df['Property']
dnc_list = list(dnc)
url_base = "https://www.realestate.com.au/property/"
URL_LIST = []

for nd in dnc_list:
    nd = nd.strip()
    nd = nd.lower()
    nd = nd.replace(" ", "-")
    URL_LIST.append(url_base + nd)

text2search = '''The information provided'''

with open('Auctions.csv', 'wb') as csv_file:
    writer = csv.writer(csv_file)

    for index, url in enumerate(URL_LIST):
        page = requests.get(url)
        print '
' 'Scraping URL ' + str(index+1) +   ' of  ' + str(len(URL_LIST)),

        if text2search in page.text:
            tree = html.fromstring(page.content)
            (title,) = (x.text_content() for x in tree.xpath('//title'))
            (Unique_ID,) = (x.text_content() for x in    tree.xpath('//a[@class="property-value__link--muted rui-button-brand property-    value__btn-listing"]'))
            #(sold,) = (x.text_content().strip() for x in     tree.xpath('//p[@class="property-value__agent"]'))
            writer.writerow([title, Unique_ID])

Andersson · Accepted Answer

text_content() allows you to get text only. Try to scrape @href as below

(Unique_ID,) = (x for x in tree.xpath('//a[@class="property-value__link--muted rui-button-brand property-value__btn-listing"]/@href'))

Cannot get a specific href out of requests

Answers (1)

Related Questions