Reputation: 231
I'm trying to capture a unique url using Pythons Requests
Source website is https://www.realestate.com.au/property/1-10-grosvenor-rd-terrigal-nsw-2260
Goal Url is http://www.realestate.com.au/sold/property-unit-nsw-terrigal-124570934
When i tried
(Unique_ID,) = (x.text_content() for x in tree.xpath('//a[@class="property-
value__link--muted rui-button-brand property-value__btn-listing"]'))
The CSV returned View Listing
Unless im mistaken, i've done the correct class search, as the href would not be unique enough? Am i supposed to do something different to capture URL's instead of text?
Full code below if required.
Thanks in advance.
import requests
import csv
import datetime
import pandas as pd
import csv
from lxml import html
df = pd.read_excel("C:\Python27\Projects\REA_UNIQUE_ID\\UN.xlsx", sheetname="UN")
dnc = df['Property']
dnc_list = list(dnc)
url_base = "https://www.realestate.com.au/property/"
URL_LIST = []
for nd in dnc_list:
nd = nd.strip()
nd = nd.lower()
nd = nd.replace(" ", "-")
URL_LIST.append(url_base + nd)
text2search = '''The information provided'''
with open('Auctions.csv', 'wb') as csv_file:
writer = csv.writer(csv_file)
for index, url in enumerate(URL_LIST):
page = requests.get(url)
print '\r' 'Scraping URL ' + str(index+1) + ' of ' + str(len(URL_LIST)),
if text2search in page.text:
tree = html.fromstring(page.content)
(title,) = (x.text_content() for x in tree.xpath('//title'))
(Unique_ID,) = (x.text_content() for x in tree.xpath('//a[@class="property-value__link--muted rui-button-brand property- value__btn-listing"]'))
#(sold,) = (x.text_content().strip() for x in tree.xpath('//p[@class="property-value__agent"]'))
writer.writerow([title, Unique_ID])
Upvotes: 0
Views: 43
Reputation: 52665
text_content()
allows you to get text only. Try to scrape @href
as below
(Unique_ID,) = (x for x in tree.xpath('//a[@class="property-value__link--muted rui-button-brand property-value__btn-listing"]/@href'))
Upvotes: 1