Reputation: 13692
I am trying to write a simple HTTP client in OCaml. I understand that it would be easier to use a library like cohttp, etc. I am doing this for my own sake, so no need to make the suggestion.
Here is my code.
module Connection = struct
let sock_fd =
let s_fd = Unix.socket Unix.PF_INET Unix.SOCK_STREAM 0 in
Unix.setsockopt s_fd Unix.TCP_NODELAY true;
s_fd
let read_timeout = 10.0
let read_from_sock () =
let buff_size = 4096 in
let buff = Bytes.create buff_size in
let rec read_all response =
let (read_fds, _, _) = Unix.select [sock_fd] [] [] read_timeout in
match read_fds with
| [] -> response
| (read_fd :: _) -> begin
let _ = Unix.read read_fd buff 0 buff_size in
let current_response = response ^ buff in
read_all current_response
end in
read_all ""
let write_to_sock str =
let len = String.length str in
let _ = Unix.write sock_fd str 0 len in ()
let make_request request serv_addr =
Unix.connect sock_fd serv_addr;
write_to_sock request
class connection address port =
object
val serv_addr = Unix.ADDR_INET (Unix.inet_addr_of_string address, port)
method get_response (request: string) =
make_request request serv_addr;
let response = read_from_sock () in
Printf.printf "%s\n" response;
Unix.shutdown sock_fd Unix.SHUTDOWN_ALL;
Unix.close sock_fd
end
let create address port = new connection address port
end
let connection = Connection.create "54.175.219.8" 80;;
connection#get_response "GET / HTTP/1.1\r\nHost: www.httpbin.org\r\n\r\n"
As I posted before - if you find it helpful - I'd imagine a (very rough) C equivalent is something like this:
int sock_fd = socket(PF_INET, SOCK_STREAM);
setsockopt(sock_fd, TCP_NODELAY, 1);
serv_addr addr {"54.175.219.8", 80};
connect(sock_fd, &serv_addr);
write(sock_fd, "GET / HTTP/1.1\r\nHost: www.httpbin.org\r\n\r\n");
char buffer[512];
while (sock_fd = select(sock_fd, 10.0)) {
if (!sock_fd) break;
read(sock_fd, &buffer);
printf("%s\n", buffer);
flush(stdout);
}
shutdown(sock_fd, SHUTDOWN_ALL);
close(sock_fd);
When I execute this, I get extremely varied results. One time, I did actually get the entire page. But most of the time, it gets cut off about 80% of the way through the page. I tried increasing the timeout to no avail.
I thought if I polled the file descriptor I would be able to reliably know when there is no more data like this blog suggests. It seems like this method was an improvement on looping until the read size was less than the buffer_size, but I guess not? What am I missing?
UPDATE:
I edited my code to check if the read size was less than the buffer size. However, this seems redundant. If there is more to read, select will return the file descriptor. If there is no more to read, it will not and I will just return what I've read. This is the new code:
let r = Unix.read read_fd buff 0 buff_size in
let current_response = response ^ buff in
if r < buff_size
then current_response
else read_all response
But actually this is just wrong. This completely takes away the point of polling the file descriptor. Perhaps the issue is still with having read less than buff_size data... But I really have no idea for any other way I might deal with it. Whatever is read (regardless of < buff_size or not) will still be appended to the response. read_all will attempt to complete reading until select no longer returns the file descriptor, at which point, there should be nothing more to read.
FINAL SOLUTION (thanks to @ivg):
let read_from_sock () =
let buff_size = 4096 in
let buff = Bytes.create buff_size in
let rec read_all response =
let (read_fds, _, _) = Unix.select [sock_fd] [] [] read_timeout in
let rec read_all_helper current_response =
match read_fds with
| [] -> current_response
| (read_fd :: _) -> begin
let r = Unix.read read_fd buff 0 buff_size in
let current_response = response ^ (String. sub buff 0 r) in
if r < buff_size then read_all current_response
else read_all_helper current_response
end in
read_all_helper response in
read_all ""
Upvotes: 0
Views: 86
Reputation: 35240
Yes, that's actually that kind of problems, that I was expecting from your code, based on your previous post. Here is the root of the Evil:
let _ = Unix.read read_fd buff 0 buff_size in
You can't ignore the result of read, because it is not guaranteed, that read
call will read exactly buff_size
, it can return less data (so called "short read"). The same issue is with the write
call. So, you need carefully work with your buffers, to rebuild the data, after short reads. Another issue, is that calls can be interrupted with signals, but I think that you don't hit this right now.
Upvotes: 2