Geek
Geek

Reputation: 3329

jsoup connect parameter

I access a webpage by passing the session id and url and output is a HTML response. I want to use jSoup to parse this response and get the tag elements. I see the examples in Jsoup takes a String for establishing connection. How do i proceed.

pseudo code:

I tried the above method and got this exception

java.io.IOException: 401 error loading URL http://www.abc.com/index
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:387)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)

Basically the entity.getContent() has the HTML response which has to be passed as a String to the connect method. But it doesn't work.

Upvotes: 0

Views: 3662

Answers (2)

BalusC
BalusC

Reputation: 1109142

Apache Commons HttpClient and Jsoup do not share the same cookie store. You basically need to pass the very same cookies as HttpClient has retrieved back through Jsoup's Connection. You can find some concrete examples here:

Alternatively, you can also just continue using HttpClient for firing HTTP requests and maintaining the cookies and instead feeds its HttpResponse as String through Jsoup#parse().

So this should do:

HttpResponse httpResponse = httpclient1.execute(httpget, httpContext);
String html = EntityUtils.toString(httpResponse.getEntity());
Document doc = Jsoup.parse(html, testUrl);
// ...

By the way, you do not necessarily need to create a whole new HttpClient for a subsequent request. Just reuse httpclient which you already created. Also your way of obtaining the response as String is clumsy. The second line in the above example shows how to do it at simplest.

Upvotes: 1

RanRag
RanRag

Reputation: 49577

It shows an http error 401 which means

Similar to 403 Forbidden, but specifically for use when authentication is possible but has failed or not yet been provided.

Therefore, i think you need to login into the website using your java code or identify yourself by sending cookies through your code.

Upvotes: 0

Related Questions