Asad
Asad

Reputation: 1300

How to programmatically send http request to inner webpage link with java?

I am try to making an java application which will be connected with a server and then try to access a link of that server page. For example, I have a link "http://goodserver.com" and I am able to connect with this url by this code

InetAddress addr = null;
          Socket sock = new Socket("http://goodserver.com", 80);
          addr = sock.getInetAddress();
          System.out.println("Connected to " + addr);

Now I am also able to read the whole source code of this page. But there are button with links. When I go through a browser I can easily click on those button and go to that link. For example a button named "Test" and the corresponding link is "http://goodserver.com/targets/Test".

I want to access this link by java but the problem is that it can't be connected directly. I don't want to clcik this link by java as I have read this link "Programmatically click a webpage button" . I just want to know the mechanism by which a browser can access the link after loading the home page but its not possible through java http request.

I have read the page by this code

URL url = new URL("http://goodserver.com");
  BufferedReader reader = new BufferedReader
  (new InputStreamReader(url.openStream()));
  BufferedWriter writer = new BufferedWriter
  (new FileWriter("data.html"));
  String line;
  while ((line = reader.readLine()) != null) {
     System.out.println(line);
     writer.write(line);
     writer.newLine();
  }
  reader.close();
  writer.close();

When replace this home page link with my target button link "http://goodserver.com/targets/Test" I am getting the home page source code not the target page.

I know that a browser also send http requests to get pages then it should be possible by java. Thanks in advance.

Upvotes: 0

Views: 1951

Answers (1)

Gui Meira
Gui Meira

Reputation: 885

If the result of the second request depends on whether you accessed the home page or not, your problem probably has something to do with cookies.

HTTP is a stateless protocol, that means that each request is independent from the others. When you open a page and click a button, you generate a new request to that other URL, but the server has no clue about who you are or what pages you opened before.

Cookies make it possible for the server to "remember" who you are. They work as follows: when you request a page, the server will send the contents of that page to you, but they can also send some extra information called a cookie. Your browser stores that information and everytime you make another request to the same server, the browser sends the cookies with that request. So, even though the server doesn't know at first who is making the request, now it's able to take a look at the cookie and realise that it sent that information to you, you it must be you the person that is making that request.

So, this is the part you are probably missing in your problem: storing the cookies that the server sends to you when you load the home page and then sending them again when you request the other page, to "remind" the server that you have already accessed the home page.

Naturally, you could do it by hand by parsing the HTTP headers, but I strongly recommend that you use some library to do this for you. The Apache HTTP Client is probably the best you can find in the Java world. Here's a short example of how you can keep cookies across requests:

public class CookiesExample {

    public static void main(String[] args) throws Exception {
        //This object will store your cookies:
        BasicCookieStore cookieStore = new BasicCookieStore();

        //Create a client using our cookie store:
        CloseableHttpClient httpclient = HttpClients.custom()
                .setDefaultCookieStore(cookieStore)
                .build();

        try {
            //Execute request:
            HttpGet httpget = new HttpGet("https://example.com/");
            CloseableHttpResponse response = httpclient.execute(httpget);
            try {
                //Consume the response:
                HttpEntity entity = response.getEntity();
                EntityUtils.consume(entity);
            } finally {
                response.close();
            }

            //Whatever cookies that were sent by the server in that request 
            //are now stored in our cookie store. Subsequent requests will
            //send those cookies to the server.

            httpget = new HttpGet("https://example.com/my/awesome/internal/page");
            response = httpclient.execute(httpget);
            try {
                //Consume the response:
                HttpEntity entity = response.getEntity();
                EntityUtils.consume(entity);
            } finally {
                response.close();
            }
        } finally {
            httpclient.close();
        }
    }
}

Another possible solution would be to use an actual browser that takes care of all of that for you. JavaFX has a browser component that can be controlled from Java and there's also Selenium that lets you use a "driver" to control a real browser (Chrome, Firefox, IE, ...).

Upvotes: 1

Related Questions