Green
Green

Reputation: 713

Not able to read web page content

I am trying to read a web page content using below code. But it do not print the web content as is expected. There is no error seen on IDE. And there is no exception. Proxy is also not set. Could anyone guide why it might not be working

import java.net.*;
import java.io.*;

public class URLReader {
    public static void main(String[] args) throws Exception {

        URL oracle = new URL("http://www.oracle.com/");
        BufferedReader in = new BufferedReader(
        new InputStreamReader(oracle.openStream()));

        String inputLine;
        while ((inputLine = in.readLine()) != null)
            System.out.println(inputLine);
        in.close();
    }

The control do not go inside while loop as debugged. But in buffer reader obj , there is a java object seen

Upvotes: 0

Views: 341

Answers (2)

Stephen C
Stephen C

Reputation: 719576

The accepted answer (and the comments) don't actually explain what is going on here and why the program doesn't.

First of all, open the URL http://www.oracle.com/ in your favorite web browser. Notice how you actually end up with the URL https://www.oracle.com/index.html in the URL bar? What has happened is that the web server at http://www.oracle.com/ has REDIRECTED your browser to the new URL.

Redirects work by the server sending some kind of redirect response (status code 3xx) to the GET request that the browser makes. The browser reads the redirect response, extracts the target URL for the redirect, and then resends the GET request to the target URL. (This can be repeated ....)

So what is happening in your example is that your code is not respecting the redirect. Instead, it is simply treating the 3xx response as a normal response. Your code is then reading the "body" of the response, which is empty.

When you manually change the URL to the true target (or equivalent), you avoid the need for the redirect .... an your code works. If you want your code to be capable of dealing with redirects, you need to write it differently.

However, in this case is not sufficient to use HttpUrlConnection simply turn on the "follow redirects" option. The Java HTTP stack will not follow redirects to a different protocol (e.g. HTTP to HTTPS); see this Q&A:

Upvotes: 0

Matthew Diana
Matthew Diana

Reputation: 1106

The URL http://www.oracle.com/ is not an HTTPS URL, so you won't see any output when attempting to print the website's contents. Try running your program with this URL instead: https://www.oracle.com/

Upvotes: 1

Related Questions