Reputation:
Trying to practice extracting data from tables using JSoup. Can't figure out why I can't pull the "Shares Outstanding" field from
https://finance.yahoo.com/q/ks?s=AAPL+Key+Statistics
Here's two attempts where 's' is AAPL:
public class YahooStatistics {
String sharesOutstanding = "Shares Outstanding:";
public YahooStatistics(String s) {
String keyStatisticsURL = ("https://finance.yahoo.com/q/ks?s="+s+"+Key+Statistics");
//Attempt 1
try {
Document doc = Jsoup.connect(keyStatisticsURL).get();
for (Element table : doc.select("table.yfnc_datamodoutline1")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
for (Element td : tds.select(sharesOutstanding)) {
System.out.println(td.ownText());
}
}
}
}
catch (IOException ex) {
ex.printStackTrace();
}
//Attempt 2
try {
Document doc = Jsoup.connect(keyStatisticsURL).get();
for (Element table : doc.select("table.yfnc_datamodoutline1")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
for (int j = 0; j < tds.size() - 1; j++) {
Element td = tds.get(j);
if ((td.ownText()).equals(sharesOutstanding)) {
System.out.println(tds.get(j+1).ownText());
}
}
}
}
}
catch(IOException ex) {
ex.printStackTrace();
}
The attempts return: BUILD SUCCESSFUL and nothing else.
I've disabled JavaScript on my browser and the table still shows, so I'm assuming this is not written in JavaScript but HTML.
Any suggestions are appreciated.
Upvotes: 0
Views: 520
Reputation: 34638
Notes about your source after the edit:
ownText()
rather than text()
. text()
gives you the combined text of all the element and all its sub-elements. In this case the element contains Shares Outstanding<font size="-1"><sup>5</sup></font>:
, so its combined text is "Shares Outstanding5:"
. If you use ownText
it will just be "Shares Outstanding:"
.:
). Update the value in sharesOutstanding
accordingly.+
following the AAPL
.You can either break from your loops once you found a match, go back to your original version (with corrections as above) - see note - or you can try using a more sophisticated query which will only match once:
Elements elems = doc.select("td.yfnc_tablehead1:containsOwn("+sharesOutstanding+") + td.yfnc_tabledata1");
if ( ! elems.isEmpty() ) {
System.out.println( elems.get(0).owntext() );
}
This selector gives you all the td
elements whose class is yfnc_tabledata1
, whose immediate preceding sibling is a td
element whose class is yfnc_tablehead1
and whose own text contains the "Shares Outstanding:" string. This should basically select the exact TD you need.
Note: the previous version of this answer was a long rattle about the difference between Elements.select()
and Element.select()
. It turns out that I was dead wrong and your original version should have worked - if you had corrected the four points above. So to set the record straight: select()
on an Elements
actually does look inside each element and the resulting list may contain descendents of any of the elements in the original list that match the selection. Sorry about that.
Upvotes: 3