Reputation: 39
I've been trying to extract some text for a while now, and while everything works fine, there is something I can't manage to get.
Take this website : https://duproprio.com/fr/montreal/pierrefonds-roxboro/condo-a-vendre/hab-305-5221-rue-riviera-854000
I want to get the texts from the class=listing-main-characteristics__number nodes (below the picture, the box with "2 chambres 1 salle de bain Aire habitable (s-sol exclu) 1,030 pi2 (95,69m2)", there are 3 elements with that class in the page ( "2", "1" and "1,030 pi² (95,69 m²)"). I've tried a bunch of options in XPath and CSS, but none has worked, some gave back strange answers.
For example, with :
response.xpath('//span[@class="listing-main-characteristics__number"]').getall()
I get :
['<span class="listing-main-characteristics\_\_number">\n 2\n </span>', '<span class="listing-main-characteristics\_\_number">\n 1\n </span>']
For example, something else that works just fine on the same webpage :
response.xpath('//div[@property="description"]/p/text()').getall()
If I get all the spans with this query :
response.css('span::text').getall()
I can find my texts mentioned in the beginning in the. But from this :
response.css('span[class=listing-main-characteristics__number]::text').getall()
I only get this
['\n 2\n ', '\n 1\n ']
Could someone clue me in with what kind of selection I would need? Thank you so much!
Upvotes: 0
Views: 45
Reputation: 14145
Here is the xpath that you have to use.
//div[@data-label='#description']//div[@class='listing-main-characteristics__label']|//div[@data-label='#description']//div[@class='listing-main-characteristics__item-dimensions']/span[2]
you might have to use the above xpath. (Add /text() is you want the associated text.)
response.xpath("//div[@data-label='#description']//div[@class='listing-main-characteristics__label']|//div[@data-label='#description']//div[@class='listing-main-characteristics__item-dimensions']/span[2]").getall()
Below is the python sample code
url = "https://duproprio.com/fr/montreal/pierrefonds-roxboro/condo-a-vendre/hab-305-5221-rue-riviera-854000#description"
driver.get(url)
# get the output elements then we will get the text from them
outputs = driver.find_elements_by_xpath("//div[@data-label='#description']//div[@class='listing-main-characteristics__label']|//div[@data-label='#description']//div[@class='listing-main-characteristics__item-dimensions']/span[2]")
for output in outputs:
# replace the new line character with space and trim the text
print(output.text.replace("\n", ' ').strip())
Output:
2 chambres
1 salle de bain
1,030 pi² (95,69 m²)
Upvotes: 1