Bulbuzor
Bulbuzor

Reputation: 39

Scrapy can't manage to request text with neither CSS or xPath

I've been trying to extract some text for a while now, and while everything works fine, there is something I can't manage to get.

Take this website : https://duproprio.com/fr/montreal/pierrefonds-roxboro/condo-a-vendre/hab-305-5221-rue-riviera-854000

I want to get the texts from the class=listing-main-characteristics__number nodes (below the picture, the box with "2 chambres 1 salle de bain Aire habitable (s-sol exclu) 1,030 pi2 (95,69m2)", there are 3 elements with that class in the page ( "2", "1" and "1,030 pi² (95,69 m²)"). I've tried a bunch of options in XPath and CSS, but none has worked, some gave back strange answers.

For example, with :

response.xpath('//span[@class="listing-main-characteristics__number"]').getall()

I get :

['<span class="listing-main-characteristics\_\_number">\n 2\n </span>', '<span class="listing-main-characteristics\_\_number">\n 1\n </span>']

For example, something else that works just fine on the same webpage :

response.xpath('//div[@property="description"]/p/text()').getall()

If I get all the spans with this query :

response.css('span::text').getall()

I can find my texts mentioned in the beginning in the. But from this :

response.css('span[class=listing-main-characteristics__number]::text').getall()

I only get this

['\n                        2\n                    ', '\n                        1\n                    ']

Could someone clue me in with what kind of selection I would need? Thank you so much!

Upvotes: 0

Views: 45

Answers (1)

supputuri
supputuri

Reputation: 14145

Here is the xpath that you have to use.

//div[@data-label='#description']//div[@class='listing-main-characteristics__label']|//div[@data-label='#description']//div[@class='listing-main-characteristics__item-dimensions']/span[2]

you might have to use the above xpath. (Add /text() is you want the associated text.)

response.xpath("//div[@data-label='#description']//div[@class='listing-main-characteristics__label']|//div[@data-label='#description']//div[@class='listing-main-characteristics__item-dimensions']/span[2]").getall()

Below is the python sample code

url = "https://duproprio.com/fr/montreal/pierrefonds-roxboro/condo-a-vendre/hab-305-5221-rue-riviera-854000#description"
driver.get(url)
# get the output elements then we will get the text from them
outputs = driver.find_elements_by_xpath("//div[@data-label='#description']//div[@class='listing-main-characteristics__label']|//div[@data-label='#description']//div[@class='listing-main-characteristics__item-dimensions']/span[2]")
for output in outputs:  
    # replace the new line character with space and trim the text
    print(output.text.replace("\n", ' ').strip())

Output:

2 chambres

1 salle de bain

1,030 pi² (95,69 m²)

Screenshot: enter image description here

Upvotes: 1

Related Questions