Reputation: 11
I need to download a html page in chunks. I had build a GET reuest whick can download a certain range of data. But i am unsuccessful in doing this in a repetitive manner. Basically I have to reciver first 0-99 bytes then 100-199 and so on... Also I would be grateful to know how toh know the exact size of receiving file beforehand using c or c++ code. Following is my code. i have exempted connectig to sockets etc. as it have been done successfully.
int c=0,s=0;
while(1)
{
get = build_get_query(host, page,s);
c+=1;
fprintf(stderr, "Query is:\n<<START>>\n%s<<END>>\n", get);
//Send the query to the server
int sent = 0;
cout<<"sending "<<c<<endl;
while(sent < strlen(get))
{
tmpres = send(sock, get+sent, strlen(get)-sent, 0);
if(tmpres == -1)
{
perror("Can't send query");
exit(1);
}
sent += tmpres;
}
//now it is time to receive the page
memset(buf, 0, sizeof(buf));
int htmlstart = 0;
char * htmlcontent;
cout<< "reciving "<<c<<endl;
while((tmpres = recv(sock, buf, BUFSIZ, 0)) > 0)
{
if(htmlstart == 0)
{
/* Under certain conditions this will not work.
* If the \r\n\r\n part is splitted into two messages
* it will fail to detect the beginning of HTML content
*/
htmlcontent = strstr(buf, "\r\n\r\n");
if(htmlcontent != NULL)
{
htmlstart = 1;
htmlcontent += 4;
}
}
else
{
htmlcontent = buf;
}
if(htmlstart)
{
fprintf(stdout, htmlcontent);
}
memset(buf, 0, tmpres);
}
if(tmpres < 0)
{
perror("Error receiving data");
}
s+=100;
if(c==5)
break;
}
char *build_get_query(char *host, char *page,int i)
{
char *query;
char *getpage = page;
int j=i+99;
char tpl[100] = "GET /%s HTTP/1.1\r\nHost: %s\r\nRange: bytes=%d-%d\r\nUser- Agent: %s\r\n\r\n";
if(getpage[0] == '/')
{
getpage = getpage + 1;
fprintf(stderr,"Removing leading \"/\", converting %s to %s\n", page, getpage);
}
query = (char *)malloc(strlen(host)+strlen(getpage)+8+strlen(USERAGENT)+strlen(tpl)-5);
sprintf(query, tpl, getpage, host, i , j, USERAGENT);
return query;
}
Upvotes: 1
Views: 1463
Reputation: 123531
Also I would be grateful to know how toh know the exact size of receiving file beforehand using c or c++ code.
If the server supports a range request to the specific resource (which is not guaranteed) then the answer will look like this:
HTTP/1.1 206 partial content
Content-Range: bytes 100-199/12345
This means that the response will contain the bytes 100..199 and that the total size of the content is 12345 bytes.
There are lots of questions here which deal with parsing HTTP headers so I will not go into the detail on how to specifically use C/C++ to extract these data from the header.
Please note also that you are doing a HTTP/1.1 request and thus must deal with possible chunked responses and implicit keep alive. I really recommend to use existing HTTP libraries instead of doing it all by hand and doing it wrong. If you really want to implement it all by your own please study the specification of HTTP.
Upvotes: 2