Reputation: 65
There are next block
<div class="text">
<h1>head1</h1>
Text1 <br/><br/> text12 <br/><br/> text 13
<h1>head11</h1>
Text11
<h3>head3</h3>
Text2
</div>
How to get text after first H1 with ignore <br/><br/>
as
Text1 text12 text 13
I use Grab Python page = g.doc.select('//div[@class="text"]/h3[1]/following-sibling::text()]') Result is
Text1 text12 text 13 Text11 Text2
Upvotes: 2
Views: 604
Reputation: 52848
You could try selecting the text()
that only has one preceding h1
sibling...
//div[@class='text']/text()[count(preceding-sibling::h1)=1]
Another alternative is to try using the Kayessian method...
//div[@class='text']/h1[1]/following-sibling::text()[count(.|//div[@class='text']/h1[1+1]/preceding-sibling::text()) = count(//div[@class='text']/h1[1+1]/preceding-sibling::text())]
Here's a better example and explanation of the Kayessian method.
Upvotes: 1