Flybywind
Flybywind

Reputation: 988

Can I keep connect with http sever?

Now I have got a url list and I want to get all the webpages back. Here is what i have done:

 for each url:
     getaddrinfo(hostname, port, &hints, &res);         // DNS
     // create socket 
     sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
     connect(sockfd, res->ai_addr, res->ai_addrlen);
     creatGET();
     /* for example:
        GET / HTTP/1.1\r\n
        Host: stackoverflow.cn\r\n
        ...
      */
     writeHead();   // send GET head to host
     recv();        // get the webpage content   
end

I have noted that many url's are under the same host, for example:

 http://job.01hr.com/j/f-6164230.html
 http://job.01hr.com/j/f-6184336.html
 http://www.012yy.com/gangtaiju/32692/
 http://www.012yy.com/gangtaiju/35162/

so I wonder, can I just connect only once to each host and then just creatGET(),writeHead() and recv() for each url? That may save a lot of time. So I changed my program like this:

split url into groups by their host;
for each group:
    get hostname in the group;
    getaddrinfo(hostname, port, &hints, &res);         
    sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
    connect(sockfd, res->ai_addr, res->ai_addrlen);
    for each url in the group:        
        creatGET();
        writeHead(); 
        recv();
    end
end

unfortunately, I find my program can only get the first webpage in each group back, and the rest all return empty file. Am I missing something? Maybe the sockfd need some kind of reset for each recv() ?

Thank you for you generous help .

Upvotes: 1

Views: 103

Answers (1)

Cratylus
Cratylus

Reputation: 54084

HTTP 1.1 connections are persistent meaning that after e.g. a POST/GET - 200 OK sequense the next request-response sequence could reuse the already established TCP connection.
But this is not mandatory. The connection could close at any time, so you should code for that as well.

Also it seems to me that you are trying to implement your own HTTP client.
I am not sure why you would want to do that, but anyway if you must you should read a little bit about the HTTP RFC to understand about the various headers to make sure that the underlying TCP connection is open as long as possible.

Of course if your server is an old HTTP1.0 you should not expect any reuse of connection unless explicitely indicated via keep-alive headers

Upvotes: 2

Related Questions