user2822187
user2822187

Reputation: 307

Use data retrieved from HTTPClient into JSoup

I am using HTTPClient to connect to a website.The following snippet of code is used for this purpose:

 byte[] responseBody = method.getResponseBody();
 System.out.println(new String(responseBody));

The above code displays the html code of website. Further I wanted to access only some data from the code which I was able to access using JSoup using following code snippet:

Document doc = Jsoup.connect(url).get();

In the above code I have directly specified url of website using "url". which means I do not require HTTPClient if I use JSoup. Is there a way I can use " responseBody" retrieved using HTTPClient to be integrated in JSoup code so that I do not have to use Document doc = Jsoup.connect(url).get();

Thanks

Upvotes: 6

Views: 3660

Answers (1)

StoopidDonut
StoopidDonut

Reputation: 8617

You can parse the HTML directly through Jsoup#parse:

Document doc =  Jsoup.parse(new String(responseBody));

Though I have my concerns of converting byte array to String directly, in your case however it should work fine.

The other way, I can use URLConnection and get a handle on the InputStream and parse it to a String with the provided charset encoding:

URLConnection connection = new URL("http://www.stackoverflow.com").openConnection();
        InputStream inStream = connection.getInputStream();
        String htmlText = org.apache.commons.io.IOUtils.toString(inStream, connection.getContentEncoding());

        Document document = Jsoup.parse(htmlText);
        Elements els = document.select("tbody > tr > td");

        for (Element el : els) {
            System.out.println(el.text());
        }

Would give:

Stack Overflow Server Fault Super User Web Applications Ask Ubuntu Webmasters Game Development TeX - LaTeX
Programmers Unix & Linux Ask Different (Apple) WordPress Answers Geographic Information Systems Electrical Engineering Android Enthusiasts Information Security
Database Administrators Drupal Answers SharePoint User Experience Mathematica more (14)
...

Upvotes: 6

Related Questions