Dylan halls
Dylan halls

Reputation: 52

proxy server not sending all data python

I'm creating a HTTP proxy in python but I'm having trouble in the fact that my proxy will only accept the webservers response and will completely ignore the browsers next request and the transfer of data just stops. Here's the code:

import socket

s = socket.socket()
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

bhost = '192.168.1.115'
port = 8080
s.bind((bhost, port))
s.listen(5)

def server(sock, data, host):
    p = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    p.connect((host, 80))
    p.send(data)
    rdata = p.recv(1024)
    print(rdata)
    sock.send(rdata)


    while True:
        sock, addr = s.accept()
        data = sock.recv(1024)
        host = data.splitlines()[1][6:]
        server(sock, data, host)`

Sorry about the code this is just a trial version and help will be much appreciated as I am only 14 and have much to learn :-)

Upvotes: 1

Views: 1035

Answers (1)

u354356007
u354356007

Reputation: 3215

Unfortunately I don't really see how your code should work, so I'm putting here my thoughts of how should a simple HTTP proxy look like. So what should a basic proxy server do:

  1. Accept connection from a client and receive an HTTP request.
  2. Parse the request and extract its destination.
  3. Forward requests and responses.
  4. (optionally) Support Connection: keep-alive.

Let's go step by step and write some very simplified code.

How does proxy accepts a client. A socket should be created and moved to passive mode:

import socket, select
sock = socket.socket()
sock.bind((your_ip, port))
sock.listen()
while True:
   client_sock = sock.accept()
   do_stuff(client_sock)

Once the TCP connection is established, it's time receive a request. Let's assume we're going to get something like this:

GET /?a=1&b=2 HTTP/1.1 
Host: localhost    
User-Agent: my browser details
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8    
Accept-Language: en-gb,en;q=0.5    
Accept-Encoding: gzip, deflate    
Connection: keep-alive

In TCP, message borders aren't preserved, so we should wait until we get at least first two lines (for GET request) in order to know what to do later:

def do_stuff(sock):
    data = receive_two_lines(sock)
    remote_host = parse_request(data)

After we have got the remote hostname, it's time to forward the requests and responses:

def do_stuff(client_sock):
    data = receive_two_lines(client_sock)
    remote_host = parse_request(data)
    remote_ip = socket.getaddrinfo(remote_host)  # see the docs for exact use

    webserver = socket.socket()
    webserver.connect((remote_ip, 80))

    webserver.sendall(data)
    while it_makes_sense():
        client_ready = select.select([client_sock], [], [])[0]
        web_ready = select.select([webserver], [], [])[0]

        if client_ready:
            webserver.sendall(client_sock.recv(1024))
        if web_ready:
            client_sock.sendall(webserver.recv(1024))

Please note select - this is how we know if a remote peer has sent us data. I haven't run and tested this code and there are thing left to do:

  1. Chances are, you will get several GET requests in a single client_sock.recv(1024) call, because again, message borders aren't preserved in TCP. Probably, look additional get requests each time you receive data.
  2. Request may differ for POST, HEAD, PUT, DELETE and other types of requests. Parse them accordingly.
  3. Browsers and servers usually utilise one TCP connection by setting Connection: keep-alive option in the headers, but they also may decide to drop it. Be ready to detect disconnects and sockets closed by a remote peer (for simplicity sake, this is called while it_makes_sense() in the code).
  4. bind, listen, accept, recv, send, sendall, getaddrinfo, select - all these functions can throw exceptions. It's better to catch them and act accordingly.
  5. The code currently server one client at a time.

Upvotes: 1

Related Questions