An SO User
An SO User

Reputation: 24998

Getting started with XPath

I am self-studying XPath from Pro XML Development with Java. Just for practice I have constructed a sample XML document and some XPath expressions.
Below are a few XPath expressions along with their explanations and a few related questions. Please correct me if my explanations are wrong and answer the questions wherever applicable.

XML

<?xml version="1.0" encoding="UTF-8" ?>
<people>
    <student scholarship="Yes">
        <name>John</name>
        <course>Computer Technology</course>
        <semester>6</semester>
        <scheme>E</scheme>
    </student>

    <student>
        <name>Foo</name>
        <course>Industrial Electronics</course>
        <semester>6</semester>
        <scheme>E</scheme>
    </student>

    <grumpy-cat>
        <soup-noodle>
            <student>
                <name>Dingle</name>
                <course>Grumpiness</course>
                <semester>3</semester>
                <scheme>E</scheme>
            </student>
        </soup-noodle>
    </grumpy-cat>
</people>  

Expression 1: /people/student[@scholarship='Yes']/name
Explanation: Will select the elements <name>..</name> which are contained in <people> such that <student> has an attribute named scholarship with a value of Yes
Question: Will this also select the value John in it ????

Expression 2: /people/student[2]
Explanation: Will select the element <student>..</student> which is at the 2nd position in the element <people>
Question: Will it also select the child nodes within ?

Expression 3: /people/student/@scholarship
Explanation: Will select the attribute scholarship in the element student. If there were multiple <student scholarship=""> then it would select multiple attributes

Expression 4: //name[ancestor::student]
Explanation: Will select all the <name>..</name> elements
// means 'all-the-descendants'. In my context it means 'I don't care who the descendants are as long as my immediate ancestor is student'

Upvotes: 1

Views: 237

Answers (2)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243469

Expression 1: /people/student[@scholarship='Yes']/name Explanation: Will select the elements .. which are contained in such that has an attribute named scholarship with a value of Yes Question: Will this also select the value John in it ????

This expression selects any (all) name element that is a child of a student element (whose scholarship attribute has as string value the string "yes")and that is a child of the top element (named people) of the XML document. XPath doesn't select "values" -- it selects nodes. In this case the string "John" is the string value of the selected name element. The selected name element has a single child text node, whose string value is "John".

Expression 2: /people/student[2] Explanation: Will select the element .. which is at the 2nd position in the element Question: Will it also select the child nodes within ?

This selects the second (in document order) student child of the top element (whose name must be people). The child nodes of the selected element are not selected themselves. The number of selected nodes can be obtained using the count() function:

count(/people/student[2])

and it is 1 -- this means that only the element (but not its children or descendants) is selected.

Expression 3: /people/student/@scholarship Explanation: Will select the attribute scholarship in the element student. If there were multiple then it would select multiple attributes

This selects the scholarship attribute of any student element that is a child of the top element (whose name must be people). This means that if there are N student elements that are children of the people top element, and if each of these has a scholarship attribute, then N scholarship attributes will be selected.

Expression 4: //name[ancestor::student] Explanation: Will select all the .. elements // means 'all-the-descendants'. In my context it means 'I don't care who the descendants are as long as my immediate ancestor is student'

This selects all name elements that have a student ancestor (and this ancestor may not only be the immediate parent, but also an ancestor of the immediate parent).

Here one can write an equivalent XPath expression that doesn't contain any reverse axes:

//student//name

In case you wanted to select all name elements whose parent is a student element, one way to express this is:

//student/name

Finally, I would recommend using a tool like the XPath Visualizer (which I created 12 years ago) that has helped many thousands of people learn XPath by playing and having fun.

Upvotes: 2

Martin Honnen
Martin Honnen

Reputation: 167471

All your four XPath expressions select nodes in the input tree, if you use XPath 1.0 such XPath expressions return a set of nodes (where the set can be empty or contain one or more nodes of the input tree), if you use XPath 2.0 such expressions return a sequence of nodes (which again can be empty or can contain one or more nodes of the input tree).

  1. Your first expression selects one name element node in the given input tree, this node contains a single text node with the value John.
  2. Your second expression selects a student element node in the input tree, that student element node has several child nodes (and XPath selection does simply select a node in the input tree, it does not modify anything or create new nodes).
  3. Your third expression selects a scholarship attribute node, you are right that it would select several such nodes if the input XML contained several student element nodes with scholarship attributes.
  4. Your fourth expression //name[ancestor::student] is a short form (see http://www.w3.org/TR/xpath/#path-abbrev) of /descendant-or-self::node()/name[ancestor::student] which is a short form of /descendant-or-self::node()/child::name[ancestor::student]. So it selects all name child elements of the root node as well as of all descendant nodes of the root node, where the name elements have a student ancestor element node. Your explanation of that expression is wrong, both the part about all the descendants (well this is at least imprecise) as well as the my immediate ancestor is student. The immediate ancestor is the parent, expressed simply as parent::student in XPath while your ancestor::student looks up all levels of ancestors. And all the descendants is /descendant::name. On the other hand with the way // is defined and your next step name the //name boils down to the same as /descendant::name.

Upvotes: 2

Related Questions