Reputation: 19781
HTML content:
<div class="txt-block">
<h4 class="inline">Release Date:</h4> 26 April 2019 (USA)
<span class="see-more inline"></span>
</div>
My XPath:
>>> response.xpath("//div[@class='txt-block']/h4[contains(text(), 'Release Date')]/parent::div/text()")
[<Selector xpath="//div[@class='txt-block']/h4[contains(text(), 'Release Date')]/parent::div/text()" data='\n '>,
<Selector xpath="//div[@class='txt-block']/h4[contains(text(), 'Release Date')]/parent::div/text()" data=' 26 April 2019 (USA)\n '>,
<Selector xpath="//div[@class='txt-block']/h4[contains(text(), 'Release Date')]/parent::div/text()" data='\n '>]
Can someone explain to me why I am getting a list with three results? It should return only one. With actual Release Date: 26 April 2019 (USA)
.
Upvotes: 2
Views: 406
Reputation: 111786
This part of your XPath,
//div[@class='txt-block']/h4[contains(text(), 'Release Date')]
selects the h4
. Then /parent::div
selects the parent div
. From there, the final step, text()
, selects all text node children of that div, of which there are three: two with whitespace only, and one with " 26 April 2019 (USA)\n "
.
If you only want 26 April 2019 (USA)
, use this XPath instead:
//div[@class='txt-block']/h4[.='Release Date:']/following-sibling::text()[1]
Notes:
normalize-space()
to consolidate whitespace.h4
instead of using contains()
, but your original condition would work as well.Upvotes: 4