Xpath to extract text between specific div tag and next div

Question

I want to extract text in

between the div tag 'Heading1' and the next div tag, in the example below. I can't used 'heading2 to isolate the next div as this text may change.

library(XML)
# create example html
html <- '

Heading1

text1 I want
text2 I want
text3 I want

Heading2
 

more text
more text
more text

Heading3
'

doc <- htmlParse(html)

xpath <- "//p[preceding::div[@class='AAA' and contains(., 'Heading1')]]"

xpathSApply(doc, xpath, xmlValue)

This works up to here, but I'm stuck with limiting the xpath at the next div. I have tried using the following, thinking it would get the next div.

"//p[preceding::div[@class='AAA' and contains(., 'Heading1')]and following::div[position()=1]]"

Daniel Haley · Accepted Answer

I don't think it's necessary to test the next div. You should be able to do something like this...

//p[preceding-sibling::div[1][normalize-space()='Heading1']]

or this if the class matters...

//p[preceding-sibling::div[1][@class='AAA'][normalize-space()='Heading1']]

or this if you need to still use contains()...

//p[preceding-sibling::div[1][@class='AAA'][contains(normalize-space(),'Heading1')]]

Xpath to extract text between specific div tag and next div

Answers (2)

Related Questions