inrob
inrob

Reputation: 5089

php curl memory usage

I have this function that gets the html from a list of pages and once I run it for two hours or so the script interrupts and shows that memory limit has been exceeded, Now i've tried to unset/set to null some variables hopefully to free up some memory but it's the same problem. Can you guys please take a look at the following piece of code? :

{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    if ($proxystatus == 'on'){
        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
        curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
        curl_setopt($ch, CURLOPT_PROXY, $proxy);
    }
    curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
    curl_setopt($ch, CURLOPT_URL, $site);
    ob_start();
    return curl_exec($ch); // the line the script interrupts because of memory
    ob_end_clean();
    curl_close($ch);

    ob_flush();
    $site = null;
    $ch = null;

}

Any suggestion is highly appreciated. I've set the memory limit to 128M, but before increasing it (doesnt seem like the best option to me) I would like to know if there's anything I can do to use less memory/free up memory while running the script.

Thank you.

Upvotes: 3

Views: 4957

Answers (3)

ATJ
ATJ

Reputation: 325

I know it's been a while, but others might run into a similar issue, so in case it helps anyone else... To me the problem here is that curl is set to save the output to a string. [That's what happens with curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);] If the output gets too long, the script will run out of allowed memory for that string. [That returns an error like FATAL ERROR: Allowed memory size of 134217728 bytes exhausted (tried to allocate 130027520 bytes)] The way around this is to use one of the other output methods offered by curl: output to standard output, or output to file. In either case, ob-start shouldn't be needed at all.

Hence you could replace the content of the braces with either option below:

OPTION 1: Output to standard output:

$ch = curl_init();
if ($proxystatus == 'on'){
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
    curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, $proxy);
}
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_URL, $site);
curl_exec($ch);
curl_close($ch);

OPTION 2: Output to file:

$file = fopen("path_to_file", "w"); //place this outside the braces if you want to output the content of all iterations to the same file
$ch = curl_init();
if ($proxystatus == 'on'){
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
    curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, $proxy);
}
curl_setopt($curl, CURLOPT_FILE, $file);    
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_URL, $site);
curl_exec($ch);
curl_close($ch);
fclose($file);  //place this outside of the braces if you want to output the content of all iterations to the same file

Upvotes: 2

Niet the Dark Absol
Niet the Dark Absol

Reputation: 324640

You are indeed leaking memory. Remember that return immediately ends execution of the current function, so all your cleanup (most importantly ob_end_clean() and curl_close()) is never called.

return should be the very last thing the function does.

Upvotes: 1

napolux
napolux

Reputation: 16074

For sure this is not a cURL issue. Use tools like xdebug to detect which part of your script is consuming memory.

Btw I would also change it not to run for two hours, I will move it to a cronjob that runs everyminute, check what it needs and then stops.

Upvotes: 0

Related Questions