Alok Kumar
Alok Kumar

Reputation: 69

Extract all Text from child elements/node using xpath text() function

I am using scrapy and want to get all text for Child Node . I am below command in scrapy for getting the text

response.xpath('//div[@class="A"]/text()').get()

I am expecting result :"1 -120u"

<div class="A">
<span id="B" class="C">
<span>1&nbsp;</span>-110o</span>
<span id="B">
<span>1&nbsp;</span>
-120u</span>
</div>

I have also tried below things that I discovered on stackoverlow

response.xpath('//div[@class="A"]/text()').getall()
response.xpath('//div[@class="A"]/text()').extract()
response.xpath('//div[@class="A"]//text()').get()
response.xpath('//div[@class="A"]//text()').getall()
response.xpath('//div[@class="A"]/text()').extract()

My Trail and Test in scrapy shell

Upvotes: 0

Views: 2185

Answers (1)

AnyaPi
AnyaPi

Reputation: 48

This should work to select all text inside div.A:

response.xpath('//div[@class="A"]//text()').getall()

And to filter out white-space strings:

response.xpath('//div[@class="A"]//text()[normalize-space()]').getall()

If you're looking to output "1 -120u" then you could:

substrings = response.css('span #B :not(.C)').xpath('.//text()[normalize-space()]').getall()
''.join(substrings)

This uses a css selector to select span with id of B but not class of C, then chains an xpath selector to grab all the non-whitespace text inside this span. That will return a list of substrings, which you join together to return a single string like "1 -120u"

Additional explanation:

The text you're trying to select isn't a direct child of div - it's inside layers of span elements.

  • div/text() selects only text that's a direct child of div

  • div//text() selects all text that's a descendent of div

  • .get() is for selecting one result - if your selector yields a list of results this method will return the first item in that list

  • .getall() will return a list of results when your selector picks up multiple results, as is the case in your scenario

Upvotes: 3

Related Questions