Geveze
Geveze

Reputation: 403

xpath for preceding sibling

I have the following html:

<div id="content-home">
  <p>some date 1</p>
  <div class="content"><p>bla1.1</p></div>
  <div class="content"><p>bla1.2</p></div>
  <p>some date 2</p>
  <div class="content"><p>bla2.1</p></div>
  <div class="content"><p>bla2.2</p></div>
  <div class="content"><p>bla2.3</p></div>
  <p>some date 3</p>
  <div class="content"><p>bla3.1</p></div>
  <div class="content"><p>bla3.2</p></div>
  <div class="content"><p>bla3.3</p></div>
  <div class="content"><p>bla3.4</p></div>
</div>

With xpath I want to get back the date for each div class content. With this:

tree.xpath("///div[@id='content-home']/p[following-sibling::div[@class='content']]/text()")

and also

tree.xpath("///div[@id='content-home']/p[preceding-sibling::div[@class='content']]/text()")

I get only a list with 3 entries. I want to get back 9 entries (2x date1, 3x date3 and 4x date4. I tried a lot of things but get keeping 3 entries (date1,date2,date3). How can i realise this. What i actually want to do is to register the date of each div content.

Can someone help please?

Upvotes: 0

Views: 2584

Answers (1)

Fred Foo
Fred Foo

Reputation: 363487

I don't immediately see a single XPath expression that does this, but some intermediate Python makes it easy enough:

>>> divs = x.xpath("//div[@class='content'][preceding-sibling::p]")
>>> [d.xpath("string((preceding-sibling::p)[last()])")
...  for d in divs]
['some date 1', 'some date 1', 'some date 2', 'some date 2', 'some date 2', 'some date 3', 'some date 3', 'some date 3', 'some date 3']

The second XPath expression can be read inside out:

preceding-sibling::p

denotes the preceding siblings of the div under consideration which have tag p. Of these,

(preceding-sibling::p)[last()]

is the last one. You need the parentheses because [] binds more strongly than ::.

This is then wrapped in a string() call (because text() is a code smell) to get the string value out.

Upvotes: 4

Related Questions