minusatwelfth
minusatwelfth

Reputation: 191

How do I download xml from the internet in C++

The same way an internet browser does it when you save page as .xml, or view page source. Of course I am targeting a webpage that is in xml and starts like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Why do I want to do this? Well I want to dump the entire source of certain webpages into a string or CString, which I'm still figuring out how to do

Upvotes: 0

Views: 2893

Answers (2)

Paulo Pinto
Paulo Pinto

Reputation: 622

Since you mentioned Visual C++, a good solution would be to make use of the recently published HTTP Casablanca library from Microsoft Research, provided you are able to use C++11 as well.

http://msdn.microsoft.com/en-us/devlabs/casablanca.aspx

The you need to make use of a HTTP client, similar to what is described in this tutorial, http://msdn.microsoft.com/en-US/devlabs/hh977106.aspx

Which can be something like,

http_client client( L"http://somewebsite.com" );

client.request( methods::GET, L"page-to-download.html" )
    .then( []( http_response response ) {
        cout << "HTML SOURCE:" << endl << response.to_string() << endl; })
    .wait();

Upvotes: 2

Benjamin Lindley
Benjamin Lindley

Reputation: 103713

Using libcurl:

size_t AppendDataToStringCurlCallback(void *ptr, size_t size, size_t nmemb, void *vstring)
{
    std::string * pstring = (std::string*)vstring;
    pstring->append((char*)ptr, size * nmemb);
    return size * nmemb;
}

std::string DownloadUrlAsString(const std::string & url)
{
    std::string body;

    CURL *curl_handle;
    curl_global_init(CURL_GLOBAL_ALL);
    curl_handle = curl_easy_init();
    curl_easy_setopt(curl_handle, CURLOPT_URL, url.c_str());
    curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, AppendDataToStringCurlCallback);
    curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, &body);
    curl_easy_perform(curl_handle); 
    curl_easy_cleanup(curl_handle);

    return body;
}

Upvotes: 1

Related Questions