directory
directory

Reputation: 3167

Xpath get values where a specific text presents

With xPath I am trying to get the following values:

html:

<ul class="listVideoAttributes alpha only">
    <li class="alpha only">
        <span>Categories:</span>
        <ul>
            <li class="psi alpha">
                <a href="#">Cinema</a>
            </li>
            <li class="omega">
                <a href="#">HD</a>
            </li>
        </ul>
    </li>
</ul>

Categories are not always named as categories, sometimes they call it Tags.

I would like the following xPath to locate Categories and get the category values like Cinema and HD.

For now, I'm using:

//ul[@class="listVideoAttributes"][contains(., 'Categories:')]

and it returns values but also the text 'categories:'.

I would like to do something like:

//ul[@class="listVideoAttributes"][contains(., 'Categories:')]/ul

But it seems not to work.

Upvotes: 1

Views: 923

Answers (3)

Jens Erat
Jens Erat

Reputation: 38732

Your XPath expresion did not work, because the inner <ul/> is not direct child of the outer <ul/>. Use the descendant-or-self axis step //ul instead of the child axis step /ul at the end of your expression. If you're sure the markup will not change, better only use child axis steps: /li/ul/li/a.

Another problem is that the @class attribute does not equal listVideoAttributes, but only contain it. You should never compare HTML-class-attributes with equals, always use contains.


Anyway, I'd be as specific as possible while searching for the "headline", otherwise you could find false positives when the content of any "listVideoAttributes"-list contains one "Categories" or "Tags":

//ul[contains(@class, 'listVideoAttributes')]/li[contains(span, 'Categories') or contains(span, 'Tags')]//a

You might want to add a /text() if you cannot read the string value from the programming language you're using which would usually be preferred (eg., when a link contains bold text like <a href="..."><strong>foo</strong><a>; text() wouldn't return the string value in this case.

Upvotes: 1

Ian Roberts
Ian Roberts

Reputation: 122424

There are two problems with

//ul[@class="listVideoAttributes"][contains(., 'Categories:')]/ul

first the outer ul class is not equal to "listVideoAttributes", it only contains that as a substring, and secondly the inner ul is not a direct child of the outer one, it's a grandchild. How about

//ul[contains(@class, 'listVideoAttributes')][contains(., 'Categories')]/li/ul/li/a

Upvotes: 0

Arup Rakshit
Arup Rakshit

Reputation: 118299

You can try the below Xpath

//ul[contains(@class,'listVideoAttributes') and contains(.//span,'Categories')]//a/text()

output:

Cinema
HD

Upvotes: 0

Related Questions