Abs
Abs

Reputation: 57976

PHP Garbage collection sucks or is it just me?

I have the function below which I call very frequently in a loop.

I waited 5 minutes as the memory climbed up from 1MB to 156MB. Should't PHP's garabage collector turn up and reduce this at some point?!

Is it because I have set memory limit at 256MB?

At echo point 2,3,4 its pretty constant memory usage. It goes down my half a meg at point 4. But point 1 is where the main memory increase happens. Probably because of file_get_html loading the html file in memory.

I though the clear and unset of the variable $html would take care of this?

function get_stuff($link, $category ){

    $html = file_get_html(trim("$link"));

    $article = $html->find('div[class=searchresultsWidget]', 0);

    echo '1 - > '.convert(memory_get_usage(true)).'<br />';  

    foreach($article->find('h4 a') as $link){

        $next_url = 'http://new.mysite.com'.$link->href;

        $font_name = trim($link->plaintext);        

        $html = file_get_html(trim("$next_url"));

        $article = $html->find('form[class=addtags]', 0);

        $font_tags = '';

        foreach($article->find('ul[class=everyone_tags] li a span') as $link){

            $font_tags .= trim($link->innertext).',';   

        }

        echo '2 - > '.convert(memory_get_usage(true)).'<br />'; 

        $font_name = mysql_real_escape_string($font_name);
        $category =  mysql_real_escape_string($category);  
        $font_tags = mysql_real_escape_string($font_tags);  

        $sql = "INSERT INTO tag_data (font_name, category, tags) VALUES ('$font_name', '$category', '$font_tags')";

        unset($font_tags);
        unset($font_name);
        unset($category); 

        $html->clear();   

        mysql_query($sql); 

        unset($sql);   

        echo '3 - > '.convert(memory_get_usage(true)).'<br />';    

} 

    unset($next_url);
    unset($link);
    $html->clear(); 
    unset($html);   
    unset($article);

    echo '4 - > '.convert(memory_get_usage(true)).'<br />';

}

As you can see, I attempted to make use of unset feebly. Although its no good as I understand it won't "unset" memory as soon as I call it.

Thanks all for any help on how I can reduce this upward rise of memory.

Upvotes: 7

Views: 1896

Answers (3)

Marc B
Marc B

Reputation: 360882

PHP didn't have a proper garbage collector until 5.3. It basically used only reference counting, which would leave circular references in place until the script terminated (e.g. $a =& $a is circular). As well, the cleanup code it DID have would only run if memory pressure required it to. e.g. no point in doing an expensive cleanup cycle if the newly freed memory wasn't needed.

As of 5.3, there's a proper garbage collector, and you can force it to run with gc_enable() and gc_collect_cycles().

Upvotes: 2

jasonbar
jasonbar

Reputation: 13461

There's a known memory leak with file_get_html(): http://simplehtmldom.sourceforge.net/manual_faq.htm#memory_leak

The solution is to use

$html->clear();

Which you are doing, BUT: You're using $html both inside and outside of the loop. Inside the loop you are calling $html->clear(), and then near the end of your function $html->clear() again (I assume to catch your initial file_get_html() object reference). That last call doesn't do anything. You're leaking memory with the initial $html = file_get_html() call.

Try using a different variable ($html1, maybe?) inside your loop and see what happens.

Upvotes: 8

Artefacto
Artefacto

Reputation: 97845

The purpose of the garbage collector is solely to catch circular references.

If there are none, the variables are immediately eliminated once their reference count hits 0.

I don't recommend that you use unset, except in exceptional cases. Use functions instead and rely on the variables to go out of scope to have the memory reclaimed.

Other than that, we can't possible describe to you what's exactly happing because we'd have to know exactly what the simple DOM parser is doing. Possibly there are circular references or global resources holding a reference, but it would be difficult to know.

See reference counting basics and collecting cycles.

Upvotes: 3

Related Questions