GJ.
GJ.

Reputation: 5364

How to iterate through DOM elements that match a css class using xpath?

I'm processing an HTML page with a variable number of p elements with a css class "myclass", using Python + Selenium RC.

When I try to select each node with this xpath:

//p[@class='myclass'][n]

(with n a natural number)

I get only the first p element with this css class for every n, unlike the situation if I iterate through selecting ALL p elements with:

//p[n]

Is there any way I can iterate through elements by css class using xpath?

Upvotes: 9

Views: 3605

Answers (5)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243469

Now that I look again at this question, I think the real problem is not in iterating, but in using //.

This is a FAQ:

//p[@class='myclass'][1] 

selects every p element that has a class attribute with value "myclass" and that is the first such child of its parent. Therefore this expression may select many p elements, none of which is really the first such p element in the document.

When we want to get the first p element in the document that satisfies the above predicate, one correct expression is:

(//p)[@class='myclass'][1] 

Remember: The [] operator has a higher priority (precedence) than the // abbreviation. WHanever you need to index the nodes selected by //, always put the expression to be indexed in brackets.

Here is a demonstration:

<nums>
 <a>
  <n x="1"/>
  <n x="2"/>
  <n x="3"/>
  <n x="4"/>
 </a>
 <b>
  <n x="5"/>
  <n x="6"/>
  <n x="7"/>
  <n x="8"/>
 </b>
</nums>

The XPath expression:

//n[@x mod 2 = 0][1]

selects the following two nodes:

<n x="2" />
<n x="6" />

The XPath expression:

(//n)[@x mod 2 = 0][1]

selects exactly the first n element in the document with the wanted property:

<n x="2" />

Try this first with the following transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select="//n[@x mod 2 = 0][1]"/>
 </xsl:template>
</xsl:stylesheet>

and the result is two nodes.

<n x="2" />
<n x="6" />

Now, change the XPath expression as below and try again:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select="(//n)[@x mod 2 = 0][1]"/>
 </xsl:template>
</xsl:stylesheet>

and the result is what we really wanted -- the first such n element in the document:

<n x="2" />

Upvotes: 1

Hector M
Hector M

Reputation: 1

Here's a C# code snippet that might help you out.

The key here is the Selenium function GetXpathCount(). It should return the number of occurrences of the Xpath expression you are looking for.

You can enter //p[@class='myclass'] in XPather or any other Xpath analysis tool so you can indeed verify multiple results are returned. Then you just iterate through the results in your code.

In my case, it was all the list items in an UL that needed to be iterated -i.e. //li[@class='myclass']/ul/li - so based on your requirements should be something like:

int numProductsInLeftNav = Convert.ToInt32(selenium.GetXpathCount("//p[@class='myclass']"));

List<string> productsInLeftNav = new List<string>();
for (int i = 1; i <= numProductsInLogOutLeftNav; i++) {
    string productName = selenium.GetText("//p[@class='myclass'][" + i + "]");
    productsInLogoutLeftNav.Add(productName);
}

Upvotes: 0

Ryley
Ryley

Reputation: 21226

I don't think you're using the "index" for it's real purpose. The //p[selection][index] syntax in this selection is actually telling you which element within its parent it should be... So //p[selection][1] is saying that your selected p must be the first child of its parent. //p[selection][2] is saying it must be the 2nd child. Depending on your html, it's likely this isn't what you want.

Given that you're using Selenium and Python, there's a couple ways to do what you want, and you can look at this question to see them (there are two options given there, one in selenium Javascript, the other using the server-side selenium calls).

Upvotes: 0

Sergii Pozharov
Sergii Pozharov

Reputation: 17828

Maybe all your divs with this class are at the same level, so by //p[@class='myclass'] you receive the array of paragraphs with the specified class. So you should iterate through it using indexes, i.e. //p[@class='myclass'][1], //p[@class='myclass'][2],...,//p[@class='myclass'][last()]

Upvotes: 0

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243469

XPath 1.0 doesn't provide an iterating construct.

Iteration can be performed on the selected node-set in the language that is hosting XPath.

Examples:

In XSLT 1.0:

   <xsl:for-each select="someExpressionSelectingNodes">
     <!-- Do something with the current node -->
   </xsl:for-each>

In C#:

using System;
using System.IO;
using System.Xml;

public class Sample {

  public static void Main() {

    XmlDocument doc = new XmlDocument();
    doc.Load("booksort.xml");

    XmlNodeList nodeList;
    XmlNode root = doc.DocumentElement;

    nodeList=root.SelectNodes("descendant::book[author/last-name='Austen']");

    //Change the price on the books.
    foreach (XmlNode book in nodeList)
    {
      book.LastChild.InnerText="15.95";
    }

    Console.WriteLine("Display the modified XML document....");
    doc.Save(Console.Out);

  }
}

XPath 2.0 has its own iteration construct:

   for $varname1 in someExpression1,
       $varname2 in someExpression2, 
      .  .  .  .  .  .  .  .  .  .  .
       $varnameN in someExpressionN 
    return
        SomeExpressionUsingTheVarsAbove

Upvotes: 2

Related Questions