keji
keji

Reputation: 5990

How to Implement Workaround for CURLOPT_RETURNTRANSFER

Im using this code http://martinsikora.com/how-to-steal-google-s-did-you-mean-feature to do a "did you mean" with my searh but my hosting provider has open_basedir set and wont leyt me change. I've seen a couple workarounds but I am unaware of how I would implement these to his piece of code.

Here the snippet:

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agents[rand(0, count($agents) - 1)]);
$data = curl_exec($ch);
curl_close($ch);

Upvotes: 0

Views: 841

Answers (1)

DaveRandom
DaveRandom

Reputation: 88677

What a bizarre and annoying (and basically undocumented) restriction, especially when it can so easily be worked around. All you need to do is check for 3xx response codes, then examine the contents of the Location: header to find the URL you are being redirected to.

This is not as trivial as one my like it to be as there are many applications that violate the RFC and do not use a full URL as the data in the location header - so you would need to do a bit of fudging to get the right location.

Something like this should work for your code (untested):

function make_url_from_location ($oldUrl, $locationHeader) {
  // Takes a URL and a location header and calculates the new URL
  // This takes relative paths (which are non-RFC compliant) into
  // account, which most browsers will do. Requires $oldUrl to be
  // a full URL

  // First check if $locationHeader is a full URL
  $newParts = parse_url($locationHeader);
  if (!empty($newParts['scheme'])) {
    return $locationHeader;
  }

  // We need a path at a minimum. If not, return the old URL.
  if (empty($newParts['path'])) {
    return $oldUrl;
  }

  // Construct the start of the new URL
  $oldParts = parse_url($oldUrl);
  $newUrl = $oldParts['scheme'].'://'.$oldParts['host'];
  if (!empty($oldParts['port'])) {
    $newUrl .= ':'.$oldParts['port'];
  }

  // Build new path
  if ($newParts['path'][0] == '/') {
    $newUrl .= $newParts['path'];
  } else {
    // str_replace() to work around (buggy?) Windows behaviour where one level
    // paths cause dirname to return a \ instead of a /
    $newUrl .= str_replace('\\', '/', dirname($oldParts['path'])).$newParts['path'];
  }

  // Add a query string
  if (!empty($newParts['query'])) {
    $newUrl .= '?'.$newParts['query'];
  }

  return $newUrl;

}

$maxRedirects = 30;

$redirectCount = 0;
$complete = FALSE;

// Get user agent string once at start - array_rand() is tidier
// For these purposes, a single static string will probably be fine
$userAgent = $agents[array_rand($agents)];

do {

  // Make the request
  $ch = curl_init($url);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
  curl_setopt($ch, CURLOPT_TIMEOUT, 10);
  curl_setopt($ch, CURLOPT_HEADER, true);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch, CURLOPT_USERAGENT, $userAgent]);
  $data = curl_exec($ch);

  // Get the response code (easier than parsing it from the headers)
  $responseCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);

  // Split header from body
  $data = explode("\r\n\r\n", $data, 2);
  $header = $data[0];
  $data = $data[1];

  // Check for redirect response codes
  if ($responseCode >= 300 && $responseCode < 400) {

    if (!preg_match('/^location:\s*(.+?)$/mi', $header, $matches)) {
      // This is an error. If you get here the response was a 3xx code and
      // no location header was set. You need to handle that error here.
      $complete = TRUE;
    }

    // Get URL for next iteration
    $url = make_url_from_location(curl_getinfo($ch, CURLINFO_EFFECTIVE_URL), trim($matches[1]));

  } else {

    // Non redirect response code (might still be an error code though!)
    $complete = TRUE;

  }

// Loop until no more redirects or $maxRedirects is reached
} while (!$complete && ++$redirectCount < $maxRedirects);

// Perform whatever error checking is necessary here

// Close the cURL handle
curl_close($ch);

Upvotes: 1

Related Questions