eatonphil
eatonphil

Reputation: 13692

Unreliable http client despite polling file descriptor

I am trying to write a simple HTTP client in OCaml. I understand that it would be easier to use a library like cohttp, etc. I am doing this for my own sake, so no need to make the suggestion.

Here is my code.

module Connection = struct
    let sock_fd =
        let s_fd = Unix.socket Unix.PF_INET Unix.SOCK_STREAM 0 in
        Unix.setsockopt s_fd Unix.TCP_NODELAY true;
        s_fd

    let read_timeout = 10.0

    let read_from_sock () =
        let buff_size = 4096 in
        let buff = Bytes.create buff_size in
        let rec read_all response =
            let (read_fds, _, _) = Unix.select [sock_fd] [] [] read_timeout in
            match read_fds with
            | [] -> response
            | (read_fd :: _) -> begin
                let _ = Unix.read read_fd buff 0 buff_size in
                let current_response = response ^ buff in
                read_all current_response
            end in
        read_all ""

    let write_to_sock str =
        let len = String.length str in
        let _ = Unix.write sock_fd str 0 len in ()

    let make_request request serv_addr =
        Unix.connect sock_fd serv_addr;
        write_to_sock request

    class connection address port =
        object
            val serv_addr = Unix.ADDR_INET (Unix.inet_addr_of_string address, port)

            method get_response (request: string) =
                make_request request serv_addr;
                let response = read_from_sock () in
                Printf.printf "%s\n" response;
                Unix.shutdown sock_fd Unix.SHUTDOWN_ALL;
                Unix.close sock_fd
        end

    let create address port = new connection address port
end

let connection = Connection.create "54.175.219.8" 80;;
connection#get_response "GET / HTTP/1.1\r\nHost: www.httpbin.org\r\n\r\n"

As I posted before - if you find it helpful - I'd imagine a (very rough) C equivalent is something like this:

int sock_fd = socket(PF_INET, SOCK_STREAM);
setsockopt(sock_fd, TCP_NODELAY, 1);

serv_addr addr {"54.175.219.8", 80};
connect(sock_fd, &serv_addr);
write(sock_fd, "GET / HTTP/1.1\r\nHost: www.httpbin.org\r\n\r\n");

char buffer[512];

while (sock_fd = select(sock_fd, 10.0)) {
    if (!sock_fd) break;
    read(sock_fd, &buffer);
    printf("%s\n", buffer);
    flush(stdout);
}

shutdown(sock_fd, SHUTDOWN_ALL);
close(sock_fd);

When I execute this, I get extremely varied results. One time, I did actually get the entire page. But most of the time, it gets cut off about 80% of the way through the page. I tried increasing the timeout to no avail.

I thought if I polled the file descriptor I would be able to reliably know when there is no more data like this blog suggests. It seems like this method was an improvement on looping until the read size was less than the buffer_size, but I guess not? What am I missing?

UPDATE:

I edited my code to check if the read size was less than the buffer size. However, this seems redundant. If there is more to read, select will return the file descriptor. If there is no more to read, it will not and I will just return what I've read. This is the new code:

let r = Unix.read read_fd buff 0 buff_size in
let current_response = response ^ buff in
if r < buff_size
then current_response
else read_all response

But actually this is just wrong. This completely takes away the point of polling the file descriptor. Perhaps the issue is still with having read less than buff_size data... But I really have no idea for any other way I might deal with it. Whatever is read (regardless of < buff_size or not) will still be appended to the response. read_all will attempt to complete reading until select no longer returns the file descriptor, at which point, there should be nothing more to read.

FINAL SOLUTION (thanks to @ivg):

let read_from_sock () =
    let buff_size = 4096 in
    let buff = Bytes.create buff_size in
    let rec read_all response =
        let (read_fds, _, _) = Unix.select [sock_fd] [] [] read_timeout in
        let rec read_all_helper current_response =
            match read_fds with
            | [] -> current_response
            | (read_fd :: _) -> begin
                let r = Unix.read read_fd buff 0 buff_size in
                let current_response = response ^ (String. sub buff 0 r) in
                if r < buff_size then read_all current_response
                else read_all_helper current_response
            end in
        read_all_helper response in
    read_all ""

Upvotes: 0

Views: 86

Answers (1)

ivg
ivg

Reputation: 35240

Yes, that's actually that kind of problems, that I was expecting from your code, based on your previous post. Here is the root of the Evil:

let _ = Unix.read read_fd buff 0 buff_size in

You can't ignore the result of read, because it is not guaranteed, that read call will read exactly buff_size, it can return less data (so called "short read"). The same issue is with the write call. So, you need carefully work with your buffers, to rebuild the data, after short reads. Another issue, is that calls can be interrupted with signals, but I think that you don't hit this right now.

Upvotes: 2

Related Questions