Reputation: 579
I am trying to write a function that takes an input URL of any Stack Overflow link, gets the source code of the page, parses it, gets the accepted answer, and also gets the answer with the most upvotes.
I am new to this and I don't know how to do this. This is what I've tried out. It just returns the first answer using jsoup.
protected void doHtmlParse(String url) {
// TODO Auto-generated method stub
Document doc;
try {
doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
.referrer("http://www.google.com")
.get();
Element answer = doc.select("td[class=answercell]").get(0);
System.out.println("Answer is \n" + answer.toString());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
I only need to display the answer part, but it has to be the accepted answer. How do I approach this?
Upvotes: 1
Views: 47
Reputation: 579
I am now able to get the accepted answer via this code.
protected void doHtmlParse(String url) {
// TODO Auto-generated method stub
Document doc;
try {
doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
.referrer("http://www.google.com")
.get();
Element answer = doc.select("div[class=answer accepted-answer]").first();
Elements tds = answer.getElementsByTag("td");
for(Element td : tds) {
String clasname = td.attr("class");
if(clasname.equals("answercell")) {
System.out.println("\n\nAccepted answerrr is \n" + td.text());
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Upvotes: 0
Reputation: 804
You don't really need to parse html. Use their REST API.
Have a look.
Here's an example. Note the is_accepted
attribute.
EDIT:
Well, after you've got the chosen answer through the API, you could do this:
String answer = document.getElementById("answer-"+id).outerHtml();
Upvotes: 1