Hackaholic
Hackaholic

Reputation: 19781

XPath for sibling text?

HTML content:

<div class="txt-block">
    <h4 class="inline">Release Date:</h4> 26 April 2019 (USA)
    <span class="see-more inline"></span>
</div>

My XPath:

>>> response.xpath("//div[@class='txt-block']/h4[contains(text(), 'Release Date')]/parent::div/text()")
[<Selector xpath="//div[@class='txt-block']/h4[contains(text(), 'Release Date')]/parent::div/text()" data='\n    '>,
 <Selector xpath="//div[@class='txt-block']/h4[contains(text(), 'Release Date')]/parent::div/text()" data=' 26 April 2019 (USA)\n    '>,
 <Selector xpath="//div[@class='txt-block']/h4[contains(text(), 'Release Date')]/parent::div/text()" data='\n    '>]

Can someone explain to me why I am getting a list with three results? It should return only one. With actual Release Date: 26 April 2019 (USA).

Upvotes: 2

Views: 406

Answers (1)

kjhughes
kjhughes

Reputation: 111786

This part of your XPath,

//div[@class='txt-block']/h4[contains(text(), 'Release Date')]

selects the h4. Then /parent::div selects the parent div. From there, the final step, text(), selects all text node children of that div, of which there are three: two with whitespace only, and one with " 26 April 2019 (USA)\n ".

If you only want 26 April 2019 (USA), use this XPath instead:

//div[@class='txt-block']/h4[.='Release Date:']/following-sibling::text()[1]

Notes:

  • You can wrap that in normalize-space() to consolidate whitespace.
  • I've shown you how to test the string value of h4 instead of using contains(), but your original condition would work as well.

Upvotes: 4

Related Questions