jincy abraham
jincy abraham

Reputation: 579

Parse Stack Overflow page source code and get accepted answer

I am trying to write a function that takes an input URL of any Stack Overflow link, gets the source code of the page, parses it, gets the accepted answer, and also gets the answer with the most upvotes.

I am new to this and I don't know how to do this. This is what I've tried out. It just returns the first answer using jsoup.

protected void doHtmlParse(String url) {
    // TODO Auto-generated method stub
    Document doc;
    try {
        doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
                  .referrer("http://www.google.com")
                  .get();
        Element answer = doc.select("td[class=answercell]").get(0);
        System.out.println("Answer is  \n" + answer.toString());
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

I only need to display the answer part, but it has to be the accepted answer. How do I approach this?

Upvotes: 1

Views: 47

Answers (2)

jincy abraham
jincy abraham

Reputation: 579

I am now able to get the accepted answer via this code.

protected void doHtmlParse(String url) {
    // TODO Auto-generated method stub
     Document doc;
    try {
        doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
                  .referrer("http://www.google.com")
                  .get();
        Element answer = doc.select("div[class=answer accepted-answer]").first();
        Elements tds = answer.getElementsByTag("td");
        for(Element td : tds) {
            String clasname = td.attr("class");
            if(clasname.equals("answercell")) {
                System.out.println("\n\nAccepted answerrr is  \n" + td.text());

            }
        }

    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

Upvotes: 0

ra2085
ra2085

Reputation: 804

You don't really need to parse html. Use their REST API.

Have a look.

Here's an example. Note the is_accepted attribute.

EDIT:

Well, after you've got the chosen answer through the API, you could do this:

 String answer = document.getElementById("answer-"+id).outerHtml();

Upvotes: 1

Related Questions