Brian
Brian

Reputation: 1989

Getting a web page with Sockets

I am currently working on learning socket programming and have run into an issue that I require help with. What I am attempting to do is to write a little Java class that will connect to a web host, download the default page, then disconnect from the host. I know that it is simpler to use URLConnection to do this, but I am trying to learn the Sockets classes. I have been successful to connect to a web server but I am having difficulty pulling in the page. This is what I have working (and not working) so far:

import java.io.*;
import java.net.*;
import java.lang.IllegalArgumentException;
public class SocketsFun{
    public static void main(String[] myArgs){
        // Set some variables
        String theServer = null;
        String theLine = null;
        int thePort = 0;
        Socket theSocket = null;
        boolean exit = false;
        boolean socketCheck = false;
        BufferedReader theInput = null;

        // Grab the server and port number
        try{
            theServer = myArgs[0];
            thePort = Integer.parseInt(myArgs[1]);
            System.out.println("Opening a connection to " + theServer + " on port " + thePort);
        } catch(ArrayIndexOutOfBoundsException aioobe){
            System.out.println("usage: SocketsFun host port");
            exit = true;
        } catch(NumberFormatException nfe) {
            System.out.println("usage: SocketsFun host port");
            exit = true;
        }

        if(!exit){
            // Open the socket
            try{
                theSocket = new Socket(theServer, thePort);
            } catch(UnknownHostException uhe){
                System.out.println("* " + theServer + " does not exist");
            } catch(IOException ioe){
                System.out.println("* " + "Connection Refused");
            } catch(IllegalArgumentException iae){
                System.out.println("* " + thePort + " Not A Valid TCP/UDP Port.");
            }

            // Print out some stuff
            try{
                System.out.println("Connected Socket: " + theSocket.toString());
            } catch(Exception e){
                System.out.println("* " + "No Open Socket");
            }

            try{
                theInput = new BufferedReader(new InputStreamReader(theSocket.getInputStream()));
                while ((theLine = theInput.readLine()) != null){
                    System.out.println(theLine);
                }
                theInput.close();
            } catch(IOException ioe){
                System.out.println("* " + "No Data To Read");
            } catch(NullPointerException npe){
                System.out.println("* " + "No Data To Read");
            }

            // Close the socket
            try{
                socketCheck = theSocket.isConnected();
            } catch(NullPointerException npe){
                System.out.println("* " + "No Socket To Close");
            }
        }
    }
}

All I am wanting is for this class to spit out what might be output from "curl", "lynx -dump", or "wget", etc. Any and all help will be greatly appreciated.

Upvotes: 4

Views: 3749

Answers (2)

Robert
Robert

Reputation: 6540

You have the right idea, but you're not submitting a HTTP request. Send:

GET / HTTP/1.1\r\nHost: <hostname\r\n\r\n

This follows the format

[METHOD] [PATH] HTTP/1.1 [CRLF]
Host: [HOSTNAME] [CRLF]
OTHER: HEADERS [CRLF]
[CRLF]

You should get a response that follows a similar format - header, blank line, and data. Read about the HTTP protocol for more info.

EDIT Perhaps it'd help to get a feel for the HTTP request syntax, to start. It's pretty simple, and just a good thing to know generally. Open a terminal and use netcat (preferable) or telnet. netcat google.com 80 or telnet google.com 80. Type:

GET / HTTP/1.1[ENTER]
Host: google.com[ENTER]
[ENTER]

I get the response (folloowing the second return):

HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Thu, 09 Dec 2010 00:03:39 GMT
Expires: Sat, 08 Jan 2011 00:03:39 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 1; mode=block

<HTML&<HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

Once you get a feel for the request syntax, just write that to the socket, then read the lines until the server closes, like you're doing.

Upvotes: 6

Cameron Skinner
Cameron Skinner

Reputation: 54306

You need to write something to the socket's output stream. Web servers wait for a request from the client before sending anything: writing "GET" will ask the server to return the default page.

Your code doesn't write anything so the server will wait forever.

Upvotes: 0

Related Questions