Extracting Table Data with JSoup on Yahoo Finance

Question

Trying to practice extracting data from tables using JSoup. Can't figure out why I can't pull the "Shares Outstanding" field from

https://finance.yahoo.com/q/ks?s=AAPL+Key+Statistics

Here's two attempts where 's' is AAPL:

public class YahooStatistics {
String sharesOutstanding = "Shares Outstanding:";

public YahooStatistics(String s)    {
    String keyStatisticsURL = ("https://finance.yahoo.com/q/ks?s="+s+"+Key+Statistics");

//Attempt 1       
    try {
        Document doc = Jsoup.connect(keyStatisticsURL).get();

        for (Element table : doc.select("table.yfnc_datamodoutline1"))  {
            for (Element row : table.select("tr"))  {
                Elements tds = row.select("td");
                for (Element td : tds.select(sharesOutstanding)) {
                    System.out.println(td.ownText());
                }
            }
        }
    }
    catch (IOException ex)   {
        ex.printStackTrace();
    }

//Attempt 2

    try {
    Document doc = Jsoup.connect(keyStatisticsURL).get();

        for (Element table : doc.select("table.yfnc_datamodoutline1"))    {
            for (Element row : table.select("tr"))   {
                Elements tds = row.select("td");
                for (int j = 0; j < tds.size() - 1; j++) {
                    Element td = tds.get(j);
                    if ((td.ownText()).equals(sharesOutstanding)) {
                    System.out.println(tds.get(j+1).ownText());
                    }
                }
            }
        }
    }
    catch(IOException ex)   {
        ex.printStackTrace();
    }

The attempts return: BUILD SUCCESSFUL and nothing else.

I've disabled JavaScript on my browser and the table still shows, so I'm assuming this is not written in JavaScript but HTML.

Any suggestions are appreciated.

RealSkeptic · Accepted Answer

Notes about your source after the edit:

You should compare ownText() rather than text(). text() gives you the combined text of all the element and all its sub-elements. In this case the element contains Shares Outstanding⁵:, so its combined text is "Shares Outstanding5:". If you use ownText it will just be "Shares Outstanding:".
Note the colon (:). Update the value in sharesOutstanding accordingly.
You are passing it the wrong URL. There should be a + following the AAPL.
Your current query (at least the second attempt) is returning the element twice, because there is a nested table so it finds the TDs twice.

You can either break from your loops once you found a match, go back to your original version (with corrections as above) - see note - or you can try using a more sophisticated query which will only match once:

Elements elems = doc.select("td.yfnc_tablehead1:containsOwn("+sharesOutstanding+") + td.yfnc_tabledata1");

if ( ! elems.isEmpty() ) {
    System.out.println( elems.get(0).owntext() );
}

This selector gives you all the td elements whose class is yfnc_tabledata1, whose immediate preceding sibling is a td element whose class is yfnc_tablehead1 and whose own text contains the "Shares Outstanding:" string. This should basically select the exact TD you need.

Note: the previous version of this answer was a long rattle about the difference between Elements.select() and Element.select(). It turns out that I was dead wrong and your original version should have worked - if you had corrected the four points above. So to set the record straight: select() on an Elements actually does look inside each element and the resulting list may contain descendents of any of the elements in the original list that match the selection. Sorry about that.

Extracting Table Data with JSoup on Yahoo Finance

Answers (1)

Related Questions