Reputation: 403
I have the following html:
<div id="content-home">
<p>some date 1</p>
<div class="content"><p>bla1.1</p></div>
<div class="content"><p>bla1.2</p></div>
<p>some date 2</p>
<div class="content"><p>bla2.1</p></div>
<div class="content"><p>bla2.2</p></div>
<div class="content"><p>bla2.3</p></div>
<p>some date 3</p>
<div class="content"><p>bla3.1</p></div>
<div class="content"><p>bla3.2</p></div>
<div class="content"><p>bla3.3</p></div>
<div class="content"><p>bla3.4</p></div>
</div>
With xpath I want to get back the date for each div class content. With this:
tree.xpath("///div[@id='content-home']/p[following-sibling::div[@class='content']]/text()")
and also
tree.xpath("///div[@id='content-home']/p[preceding-sibling::div[@class='content']]/text()")
I get only a list with 3 entries. I want to get back 9 entries (2x date1, 3x date3 and 4x date4. I tried a lot of things but get keeping 3 entries (date1,date2,date3). How can i realise this. What i actually want to do is to register the date of each div content.
Can someone help please?
Upvotes: 0
Views: 2584
Reputation: 363487
I don't immediately see a single XPath expression that does this, but some intermediate Python makes it easy enough:
>>> divs = x.xpath("//div[@class='content'][preceding-sibling::p]")
>>> [d.xpath("string((preceding-sibling::p)[last()])")
... for d in divs]
['some date 1', 'some date 1', 'some date 2', 'some date 2', 'some date 2', 'some date 3', 'some date 3', 'some date 3', 'some date 3']
The second XPath expression can be read inside out:
preceding-sibling::p
denotes the preceding siblings of the div
under consideration which have tag p
. Of these,
(preceding-sibling::p)[last()]
is the last one. You need the parentheses because []
binds more strongly than ::
.
This is then wrapped in a string()
call (because text()
is a code smell) to get the string value out.
Upvotes: 4