Reputation: 2116
I am trying to extract the source of a website, and I have researched a bit and many solutions point to using HTTPClient and HTTPContext but the problem is that I cannot use a URL to get this source from. The website I am using is based on logins and no matter who you are logged in as, it displays the same URL (but, of course, the information to be extracted is different based on the user). Therefore, I was wondering if there way a way to directly get the source from, perhaps, a webview or something of the sort. In summary, I cannot use a URL intermediate because it is uniform and basically redirects to a generic log-in page.
Sorry if I am missing something; I am new to this. Thank you for the help in advance.
EDIT:
I have found a differentiated URL that is different per user, but there is a(nother) related problem: Using jsoup, I can do Jsoup.connect("http://www.stackoverflow.com/").get().html(); (with the URL replaced with what I'm trying to access) and this does in fact get the HTML source, but the problem again arises that it asks for log-in information when I try to access a user/password protected website. I need to be able to enter username and password once and basically store this in some sort of temporary thing (cookies/cache?) and retain that information for jsoup to stop querying the login credentials each time I ask for a source based on a certain URL. I still cannot find a way to get around this...
Upvotes: 1
Views: 451
Reputation: 120858
Well if I understood correctly (let me know if I did not):
If it user/password protected should you issue a Http Post (that is what you do from a browser for example) and get the Response from that post? Something like this :
http://www.informit.com/guides/content.aspx?g=java&seqNum=44
EDIT: Here is a sample
I have a page that looks like this (it is oversimplified, but nevertheless here it is):
<form action="../../j_spring_security_check" method="post" >
<input id="j_username" name="j_username" type="text" />
<input id="j_password" name="j_password" type="password"/>
<input type="image" class="submit" id="login" name="login" />
</form>
If it where is a web page, you would have to provide the username/password to get the actual content "after" this login page. What you really issue is a HTTP POST here (I bet it's the same in your case).
Now to get the same functionality in a programmatic way...
You will need the apache http client library (you could probably do without it, but this is the easy way). Here is the maven dependency for it. you are going this for Android, right? apache http client is the default in Android from what I've read.
<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
import org.apache.commons.httpclient.Header;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.httpclient.methods.PostMethod;
public class HttpPost {
public static void main(String[] args) {
HttpClient httpClient = new HttpClient();
PostMethod postMethod = new PostMethod("http://localhost:20000/moika/moika/j_spring_security_check");
postMethod.addParameter("j_username", "ACTUAL_USER");
postMethod.addParameter("j_password", "ACTUAL_PASSWORD");
try {
int status = httpClient.executeMethod(postMethod);
System.out.println("STATUS-->" + status);
if(status == 302){
Header header = postMethod.getResponseHeader("location");
String location = header.getValue();
System.out.println("HEADER_VALUE-->" + location);
GetMethod getMethod = new GetMethod(location);
httpClient.executeMethod(getMethod);
String content = getMethod.getResponseBodyAsString();
System.out.println("CONTENT-->" + content);
}
String contentInCaseOfNoRedirect = postMethod.getResponseBodyAsString();
} catch (Exception exception){
exception.printStackTrace();
}
}
}
This might look weird a bit, but I perform a redirect (302), there seems to be an issue with that in RCF, thus the small work-around.
If you do not perform any re-directs on the server side, then you could ignore the part where I check for 302.
See what works for you.
Cheers, Eugene.
Upvotes: 1
Reputation: 53657
see the http://docs.oracle.com/javase/tutorial/networking/urls/readingWriting.html
or check the sample code
How to read content of URL
try{
URL oracle = new URL("http://www.w3schools.com/html/html_tables.asp");
URLConnection yc = oracle.openConnection();
InputStream is = yc.getInputStream();
String inputLine;
BufferedReader in = new BufferedReader(
new InputStreamReader(
yc.getInputStream()));
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}catch(Exception ex){
ex.printStackTrace();
}
Upvotes: 0