Reputation: 341
I am trying to scrape information from the following website https://www.rawson.co.za
However, sometimes, the information changes it's position. I am struggling to check for only the 'Building size' and store that as the size, since the div class looks like this:
<div class="features__item">
<div class="features__icon icon-house" aria-hidden="true"></div>
<div class="features__label">Building Size 130m²</div>
</div>
I am able to extract that but sometimes it takes other information due to the property either not having it or something else being at the position of it.
This is what i have for size now (I am accessing the information from the child/property pages):
size = response.xpath("//div[@class='features']/div[@class='features__list']/div[@class='row']/div[@class='col col--1-2'][2]/div[@class='features__item'][1]/div[@class='features__label']/text()").re(r'\d+')[0]
What I would like to take is the Building size information(only numbers) if it exists and put None if there is no building size available. I am struggling with the text part in the div class. I have tried to construct a for loop that will check if it contains the ''Building Size'' but nothing has worked yet. Any help would be very much appreciated! Thank you!
Upvotes: 1
Views: 218
Reputation: 10666
Simple:
size = response.xpath("//div[@class='features__label'][contains(., 'Building Size')]/text()").re_first(r'\d+')
Upvotes: 2