How to obtain the URL of re-directed webpage in C++

Question

I did write a c++ code which automatically parses a webpage and open and parse some of their links. The point is that in these webpage there are some addresses which were redirected to other webpages. For example, when I try to open:

https://atlas.immobilienscout24.de/property-by-address?districtId=1276001006014

I ended up opening:

https://atlas.immobilienscout24.de/orte/deutschland/baden-württemberg/böblingen-kreis/leonberg

How could I get the url of the second page in C++?

hanshenrik · Accepted Answer

you could use CURLOPT_HEADERFUNCTION to inspect the headers and parse out the Location header, eg

#include 
#include 
#include 
size_t header_callback(char *buffer,   size_t size,   size_t nitems,   void *userdata){
  const std::string needle="Location: ";
  if(nitems>needle.size()){
    if(std::memcmp(&needle[0],buffer,needle.size()) == 0 ){
      //todo: verify that im not off-by-one below.
      ((std::string*)userdata)->assign(&buffer[needle.size()],nitems-needle.size());
    }
  }
  return nitems;
}
int main(int argc, char *argv[])
{
  CURLcode ret;
  CURL *hnd = curl_easy_init();
  curl_easy_setopt(hnd, CURLOPT_URL, "https://atlas.immobilienscout24.de/property-by-address?districtId=1276001006014");
  curl_easy_setopt(hnd, CURLOPT_NOPROGRESS, 1L);
  curl_easy_setopt(hnd, CURLOPT_NOBODY, 1L);
  std::string redirect_url;
  curl_easy_setopt(hnd,CURLOPT_HEADERDATA,&redirect_url);
  curl_easy_setopt(hnd,CURLOPT_HEADERFUNCTION,header_callback);
  ret = curl_easy_perform(hnd);
  curl_easy_cleanup(hnd);
  hnd = NULL;
std::cout << redirect_url;
  return (int)ret;
}

.. but if you want the final url (in case of multiple redirects), rather than just "the second url", you should probably use CURLOPT_FOLLOWLOCATION and CURLINFO_EFFECTIVE_URL instead, eg

#include 
#include 
#include 
int main(int argc, char *argv[])
{
  CURLcode ret;
  CURL *hnd = curl_easy_init();
  curl_easy_setopt(hnd, CURLOPT_URL, "https://atlas.immobilienscout24.de/property-by-address?districtId=1276001006014");
  curl_easy_setopt(hnd, CURLOPT_NOPROGRESS, 1L);
  curl_easy_setopt(hnd, CURLOPT_NOBODY, 1L);
  curl_easy_setopt(hnd,CURLOPT_FOLLOWLOCATION,1L);
  ret = curl_easy_perform(hnd);
  char *lolc;
  curl_easy_getinfo(hnd, CURLINFO_EFFECTIVE_URL, &lolc);
  std::string final_url(lolc);
  curl_easy_cleanup(hnd);
  hnd = NULL;
  std::cout << final_url;
  return (int)ret;
}

this approach is slower (have to do at least 1 more request upon redirect), but much simpler to implement and works on both redirected urls and non-redirected urls and multiple-redirected-urls alike.

How to obtain the URL of re-directed webpage in C++

Answers (2)

Related Questions