Getting values from repeating child nodes using xpath

Question

i am building up a java application to extract the values inside the table tags using xpath.

Please suggest me an efficient way to get all 200 values from the page. my code works perfectly fine for the 100 rows withing the 1st DataTable. However, i have no way to get to the 2nd dataTable.

i am able to extract them using the following java class.

the expected output

http://a.com/   data for a  526735  Z
http://b.com/   data for b  522273  Z
.
.
.
.

http://c.com/   data for c  578335  Z  
http://d.com/   data for d  513445  Z


.
.
.100  here
.
  

 
 data for a
    526735
    Z
   
    data for b
    522273
    B

 

data for c
   526735
   Z
  
   data for d
   522273
   B
.
.
.100 rows here
.

This is the class used to get the data.

import java.io.BufferedReader;
import java.io.InputStream;
import org.w3c.tidy.*;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.w3c.tidy.Node;
import org.w3c.tidy.Tidy;
import org.w3c.tidy.Tidy;

public class CompaniesGetter {
public static void main(String[] args) throws Exception{
    String name,link,scripcode,group,s,key;
    int a=1;
    int count=1;
    URL oracle = new URL("http://money.rediff.com/companies");
    URLConnection yc = oracle.openConnection();
    InputStream is = yc.getInputStream();
    is = oracle.openStream();
    Tidy tidy = new Tidy();
    tidy.setQuiet(true);
    tidy.setShowWarnings(false);
    Document tidyDOM = tidy.parseDOM(is, null);
    XPathFactory xPathFactory = XPathFactory.newInstance();
    XPath xPath = xPathFactory.newXPath();
    Map mLink=new HashMap();
    Map mCode=new HashMap();
    Map mGroup=new HashMap();
    ArrayList aName=new ArrayList();
    //for(int j=0;j<2;j++)
    for(int i =1;i<=200;i++)
    {if(i==100)
    {
        a=2;
        s=attrib[1];
    }
        link = "//table[@class='dataTable']/tbody/tr["+i+"]/td/a/@href";
        name = "//table[@class='dataTable']/tbody/tr["+i+"]/td/a";
        scripcode = "//table[@class='dataTable']/tbody/tr["+i+"]/td[2]";
        group = "//table[@class='dataTable']/tbody/tr["+i+"]/td[3]";
        String linkValue = (String)xPath.evaluate(link, tidyDOM, XPathConstants.STRING);
        String nameValue = (String)xPath.evaluate(name, tidyDOM, XPathConstants.STRING);
        String scripValue = (String)xPath.evaluate(scripcode, tidyDOM, XPathConstants.STRING);
        String groupValue = (String)xPath.evaluate(group, tidyDOM, XPathConstants.STRING);
        aName.add(nameValue);
        mLink.put(nameValue, linkValue);
        mCode.put(nameValue, scripValue);
        mGroup.put(nameValue,groupValue);
    }
    Iterator itr=aName.iterator();
    while (itr.hasNext()){
        key=itr.next();
        System.out.println("::"+(count++)+" "+key + "  "+mLink.get(key)+"   "+mCode.get(key)+"   "+mGroup.get(key)+" ::");
    }

}

}

kisp · Accepted Answer

Hm. Just a tip: Do you use the variable "a" in the XPaths?

link = "//table[@class='dataTable']/tbody/tr["+i+"]/td/a/@href";

should be

link = "//table[@class='dataTable'][" + a + "]/tbody/tr["+i+"]/td/a/@href";

Getting values from repeating child nodes using xpath

Answers (1)

Related Questions