Reputation: 69
I am using scrapy and want to get all text for Child Node . I am below command in scrapy for getting the text
response.xpath('//div[@class="A"]/text()').get()
I am expecting result :"1 -120u"
<div class="A">
<span id="B" class="C">
<span>1 </span>-110o</span>
<span id="B">
<span>1 </span>
-120u</span>
</div>
I have also tried below things that I discovered on stackoverlow
response.xpath('//div[@class="A"]/text()').getall()
response.xpath('//div[@class="A"]/text()').extract()
response.xpath('//div[@class="A"]//text()').get()
response.xpath('//div[@class="A"]//text()').getall()
response.xpath('//div[@class="A"]/text()').extract()
Upvotes: 0
Views: 2185
Reputation: 48
This should work to select all text inside div.A:
response.xpath('//div[@class="A"]//text()').getall()
And to filter out white-space strings:
response.xpath('//div[@class="A"]//text()[normalize-space()]').getall()
If you're looking to output "1 -120u" then you could:
substrings = response.css('span #B :not(.C)').xpath('.//text()[normalize-space()]').getall()
''.join(substrings)
This uses a css selector to select span with id of B but not class of C, then chains an xpath selector to grab all the non-whitespace text inside this span. That will return a list of substrings, which you join together to return a single string like "1 -120u"
Additional explanation:
The text you're trying to select isn't a direct child of div - it's inside layers of span elements.
div/text() selects only text that's a direct child of div
div//text() selects all text that's a descendent of div
.get() is for selecting one result - if your selector yields a list of results this method will return the first item in that list
.getall() will return a list of results when your selector picks up multiple results, as is the case in your scenario
Upvotes: 3