Zaffar Saffee
Zaffar Saffee

Reputation: 6305

Excluding tag in dom tree while running xpath query

I have a html structure like this..

<div id="divid"> 
<ul id="ulid">
<li style="margin-left: 8px">
        <strong>books</strong>
</li>
<li style="margin-left: 6px">
        <a hre="">
        <span id="spanid">first line</span></a>
</li> 
<li style="margin-left: 6px">
        <a hre="">
        <span id="spanid">2nd line</span></a>
</li>
</ul>
</div>

i am parsing this html part and using xpath query

$xpath->query('//div[@id="divid"]/ul[@id="ulid"]/li/a');

and the output i want is

first line
2nd line

as for as i understand, my xpath query is okay if "strong" tag was not present in first "li" tag.

  • Note that first 'li' tag has strong tag while all other li tags have span tag in them
    now i want to EXCLUDED the 'li' tag (which consists 'strong' tag in it) from my xpath query so that i can get the values of tag below the anchor tag in li tag
    how can i modify the xpath query to make it possible? any guideline?

    the original code on which i was working was...

    <ul data-typeid="n" id="ref_1000">
    
         <li style="margin-left: -18px;">
                                                <a href="/s/ref=sr_ex_n_0?rh=i%3Aaps%2Ck%3Ahow+to+grow+tomatoes&amp;sort=salesrank&amp;keywords=how+to+grow+tomatoes&amp;ie=UTF8&amp;qid=1327692925">‹ <span class="expand">Any Department</span></a>
                                                </li>
                                        <li style="margin-left: 8px;">
                                                <strong>Books</strong>
                                            </li>
                                        <li style="margin-left: 6px;">
    
                                   <a href="/s/ref=sr_nr_n_0?rh=k%3Ahow+to+grow+tomatoes%2Cn%3A283155%2Cp_n_feature_browse-bin%3A618073011%2Cn%3A%211000%2Cn%3A48&amp;bbn=1000&amp;sort=salesrank&amp;keywords=how+to+grow+tomatoes&amp;ie=UTF8&amp;qid=1327692925&amp;rnid=1000">
                            <span class="refinementLink">Crafts, Hobbies &amp; Home</span><span class="narrowValue"> (19)</span>
                                    </a>
                        </li>
                <li style="margin-left: 6px;">
                                   <a href="/s/ref=sr_nr_n_1?rh=k%3Ahow+to+grow+tomatoes%2Cn%3A283155%2Cp_n_feature_browse-bin%3A618073011%2Cn%3A%211000%2Cn%3A10&amp;bbn=1000&amp;sort=salesrank&amp;keywords=how+to+grow+tomatoes&amp;ie=UTF8&amp;qid=1327692925&amp;rnid=1000">
                            <span class="refinementLink">Health, Fitness &amp; Dieting</span><span class="narrowValue"> (3)</span>
    
                                    </a>
                        </li>
                <li style="margin-left: 6px;">
                                   <a href="/s/ref=sr_nr_n_2?rh=k%3Ahow+to+grow+tomatoes%2Cn%3A283155%2Cp_n_feature_browse-bin%3A618073011%2Cn%3A%211000%2Cn%3A6&amp;bbn=1000&amp;sort=salesrank&amp;keywords=how+to+grow+tomatoes&amp;ie=UTF8&amp;qid=1327692925&amp;rnid=1000">
                            <span class="refinementLink">Cookbooks, Food &amp; Wine</span><span class="narrowValue"> (2)</span>
                                    </a>
                        </li>
    
                </ul>
    

    and i want to extract

    Crafts, Hobbies & Home etc closed in span tag

    Upvotes: 0

    Views: 1133

  • Answers (1)

    Wayne
    Wayne

    Reputation: 60414

    Taking the provided expression at face value -- i.e. ignoring any contradictions between the expression and your description of it -- you can use the following expression to exclude li elements that contain a strong child:

    //div[@id="divid"]/ul[@id="ulid"]/li[not(strong)]/a
    

    Upvotes: 4

    Related Questions