Reputation: 429
I understand that scraping the title uses this code scrapes the title "Google Inc (GOOG)" http://finance.yahoo.com/q?s=goog:
String name = doc.select(".title h2").first().text();
I was wondering how to scrape the title and ticker-symbol separately "Google Inc" and "GOOG":
Upvotes: 0
Views: 526
Reputation: 19648
(1) I have to Scrape Solution:
This is a short answer which doesn't include lines of exception handling, however, it is short and work out of box.
public static void main(String[] args) throws IOException {
// collect the html and create the doc
String url = "http://finance.yahoo.com/q?s=goog";
Document doc = Jsoup.connect(url).get();
// locate the header, title and then found the h2 tag
Element header = doc.select("div[id=yfi_rt_quote_summary]").get(0);
Element title = header.select("div[class=title]").get(0);
String h2 = title.select("h2").get(0).text();
// split by open parenthesis (double escape) and strip off the close parenthesis
// TODO - regular expression help handle situation where exist multiple "()"s
String[] parts = h2.split("\\(");
String name = parts[0];
String shortname = parts[1].replace(")", "");
System.out.println(name);
System.out.println(shortname);
}
Output looks like this:
Google Inc.
GOOG
(2) I don't have to Scrape Solution:
Here is really a nice post showing you how to download yahoo data programmatically.
I am also a R user and it is extremely easy to get Yahoo finance data inside R. You can do the analysis there and save that to file or database if you want. :)
Upvotes: 2
Reputation: 2572
You want to scrape the id's: "yfs_184_goog", yfs_c63_goog" and "yfs_p43_goog".
Those are the big black numbers, the little red/green numbers next to it and the percentage.
"Screen scrape" with Jsoup with element who has ID
Upvotes: 1