Reputation: 191
The same way an internet browser does it when you save page as .xml, or view page source. Of course I am targeting a webpage that is in xml and starts like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Why do I want to do this? Well I want to dump the entire source of certain webpages into a string or CString, which I'm still figuring out how to do
Upvotes: 0
Views: 2893
Reputation: 622
Since you mentioned Visual C++, a good solution would be to make use of the recently published HTTP Casablanca library from Microsoft Research, provided you are able to use C++11 as well.
http://msdn.microsoft.com/en-us/devlabs/casablanca.aspx
The you need to make use of a HTTP client, similar to what is described in this tutorial, http://msdn.microsoft.com/en-US/devlabs/hh977106.aspx
Which can be something like,
http_client client( L"http://somewebsite.com" );
client.request( methods::GET, L"page-to-download.html" )
.then( []( http_response response ) {
cout << "HTML SOURCE:" << endl << response.to_string() << endl; })
.wait();
Upvotes: 2
Reputation: 103713
Using libcurl:
size_t AppendDataToStringCurlCallback(void *ptr, size_t size, size_t nmemb, void *vstring)
{
std::string * pstring = (std::string*)vstring;
pstring->append((char*)ptr, size * nmemb);
return size * nmemb;
}
std::string DownloadUrlAsString(const std::string & url)
{
std::string body;
CURL *curl_handle;
curl_global_init(CURL_GLOBAL_ALL);
curl_handle = curl_easy_init();
curl_easy_setopt(curl_handle, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, AppendDataToStringCurlCallback);
curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, &body);
curl_easy_perform(curl_handle);
curl_easy_cleanup(curl_handle);
return body;
}
Upvotes: 1