Reputation: 988
Now I have got a url list and I want to get all the webpages back. Here is what i have done:
for each url:
getaddrinfo(hostname, port, &hints, &res); // DNS
// create socket
sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
connect(sockfd, res->ai_addr, res->ai_addrlen);
creatGET();
/* for example:
GET / HTTP/1.1\r\n
Host: stackoverflow.cn\r\n
...
*/
writeHead(); // send GET head to host
recv(); // get the webpage content
end
I have noted that many url's are under the same host, for example:
http://job.01hr.com/j/f-6164230.html
http://job.01hr.com/j/f-6184336.html
http://www.012yy.com/gangtaiju/32692/
http://www.012yy.com/gangtaiju/35162/
so I wonder, can I just connect
only once to each host and then just creatGET()
,writeHead()
and recv()
for each url? That may save a lot of time. So I changed my program like this:
split url into groups by their host;
for each group:
get hostname in the group;
getaddrinfo(hostname, port, &hints, &res);
sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
connect(sockfd, res->ai_addr, res->ai_addrlen);
for each url in the group:
creatGET();
writeHead();
recv();
end
end
unfortunately, I find my program can only get the first webpage in each group back, and the rest all return empty file.
Am I missing something? Maybe the sockfd
need some kind of reset
for each recv() ?
Thank you for you generous help .
Upvotes: 1
Views: 103
Reputation: 54084
HTTP 1.1 connections are persistent meaning that after e.g. a POST/GET - 200 OK sequense the next request-response sequence could reuse the already established TCP connection.
But this is not mandatory. The connection could close at any time, so you should code for that as well.
Also it seems to me that you are trying to implement your own HTTP client.
I am not sure why you would want to do that, but anyway if you must you should read a little bit about the HTTP RFC to understand about the various headers to make sure that the underlying TCP connection is open as long as possible.
Of course if your server is an old HTTP1.0 you should not expect any reuse of connection unless explicitely indicated via keep-alive headers
Upvotes: 2