James2086
James2086

Reputation: 231

Cannot get a specific href out of requests

I'm trying to capture a unique url using Pythons Requests

Source website is https://www.realestate.com.au/property/1-10-grosvenor-rd-terrigal-nsw-2260

Goal Url is http://www.realestate.com.au/sold/property-unit-nsw-terrigal-124570934

When i tried

(Unique_ID,) = (x.text_content() for x in tree.xpath('//a[@class="property-
value__link--muted rui-button-brand property-value__btn-listing"]'))

The CSV returned View Listing

Unless im mistaken, i've done the correct class search, as the href would not be unique enough? Am i supposed to do something different to capture URL's instead of text?

Full code below if required.

Thanks in advance.

import requests
import csv
import datetime
import pandas as pd
import csv
from lxml import html

df = pd.read_excel("C:\Python27\Projects\REA_UNIQUE_ID\\UN.xlsx",             sheetname="UN")
dnc = df['Property']
dnc_list = list(dnc)
url_base = "https://www.realestate.com.au/property/"
URL_LIST = []

for nd in dnc_list:
    nd = nd.strip()
    nd = nd.lower()
    nd = nd.replace(" ", "-")
    URL_LIST.append(url_base + nd)

text2search = '''The information provided'''

with open('Auctions.csv', 'wb') as csv_file:
    writer = csv.writer(csv_file)

    for index, url in enumerate(URL_LIST):
        page = requests.get(url)
        print '\r' 'Scraping URL ' + str(index+1) +   ' of  ' + str(len(URL_LIST)),

        if text2search in page.text:
            tree = html.fromstring(page.content)
            (title,) = (x.text_content() for x in tree.xpath('//title'))
            (Unique_ID,) = (x.text_content() for x in    tree.xpath('//a[@class="property-value__link--muted rui-button-brand property-    value__btn-listing"]'))
            #(sold,) = (x.text_content().strip() for x in     tree.xpath('//p[@class="property-value__agent"]'))
            writer.writerow([title, Unique_ID])

Upvotes: 0

Views: 43

Answers (1)

Andersson
Andersson

Reputation: 52665

text_content() allows you to get text only. Try to scrape @href as below

(Unique_ID,) = (x for x in tree.xpath('//a[@class="property-value__link--muted rui-button-brand property-value__btn-listing"]/@href'))

Upvotes: 1

Related Questions