chheplo
chheplo

Reputation: 211

XPath - Select text() in between two DIV identified by matching text in it

I have this HTML,

<div id="General" class="detailOn">
    <div class="tabconstraint"></div>
    <div id="InstitutionMain" class="detailseparate">
        <div id="InstitutionMain_divINFORight" style="float:right;width:40%"></div>
        <div style="font-weight:bold;padding-top:6px">Special Learning Opportunities</div>
        Distance learning opportunities<br>

        <div style="font-weight:bold;padding-top:6px">Student Services</div>
        Remedial services<br>
        Academic/career counseling service<br>

        <div style="font-weight:bold;padding-top:6px">Credit Accepted</div>
        Dual credit<br>
        Credit for life experiences<br>
    </div>
</div>

I want to extract

text() = between [Div/text() = "Special Learning Opportunities</div>
        Distance learning opportunities"] and [div/text()="Student Services"] 

similarly for other divs

I tried this code which gives me all text following the identified div,

div[1]/div[contains(text(),"Special Learning Opportunities")]/following-sibling::text()

While this code gives me the all text before the div

div[1]/div[contains(text(),"Student Services")]/preceding-sibling::text()

Is there a way to get exactly all the text in between specified DIVs. Thanks in advance.

I am using python 2.x and scrapy for crawling.

Note: My current method:- using these three xpaths

item['SLO']=site.select('div[1]/div[contains(text(),"Special Learning Opportunities")]/following-sibling::text()').extract()
item['SS']=site.select('div[1]/div[contains(text(),"Student Services")]/following-sibling::text()').extract()
item['CA']=site.select('div[1]/div[contains(text(),"Credit Accepted")]/following-sibling::text()').extract()

I get three items like this,

item['SLO']=['Distance learning opportunities','Remedial services',' Academic/career counseling service','Dual credit','Credit for life experiences']
item['SS']=['Remedial services',' Academic/career counseling service','Dual credit','Credit for life experiences']
item['CA']=['Dual credit','Credit for life experiences']

and then I work on python list to get what i want,

But I think there should be q quicker way in XPath to do so.

Upvotes: 3

Views: 5817

Answers (3)

Ifzal.Ahmed
Ifzal.Ahmed

Reputation: 116

You may try this..

//div[contains(text(),"Special Learning Opportunities")]//following-sibling::text()[./following-sibling::div[contains(text(),'Student Services')]]

Upvotes: 1

BeniBela
BeniBela

Reputation: 16907

You can directly translate "text between a and b" into XPath as "text()[previous-sibling = a and next-sibling = b]"

I.e.:

//text()[(preceding-sibling::div[1]/text() = "Special Learning Opportunities") and (following-sibling::div[1]/text() = "Student Services")]

should work.

(although it failed when I tested it, but it seems to be a bug in my XPath interpreter)

Upvotes: 4

Jony Adamit
Jony Adamit

Reputation: 3416

Here you go, not so classy as the previous answer, but hey - atleast it works! :-)

div[1]//div[contains(text(),"Special Learning Opportunities")]/following-sibling::node()[position() <= count( div[1]//div[contains(text(),"Student Services")]/following-sibling::node()) + 1]

Upvotes: 2

Related Questions