cumulative memory use in string assignment: $a = $a . $b vs $a .= $b

Some of you are likely familiar with how PHP handles memory in different string situations.

When a string is assigned to again, it is not "updated", it is cloned. At least this is my current understanding.

$a = 'a';
$b = 'b';
$a = $a . $b; // uses sizeof($a)*2 + sizeof($b) bytes
$a .= $b; // uses sizeof($a) + sizeof($b) bytes

In a template engine I am developing, this means huge memory consumption. I am using over 128mb of memory for a page string, which, in fact, is way less than 512kb. This is because the string is copied over and over again.

Simply put, these copies are made every time I do something like:

$page = str_replace($find, $replace, $page)

Is there a workaround to not creating this clone, generally speaking?

I bench marked this a bit, and this will produce the same output, but with a completely different memory consumption. The first one consumes a huge amount of memory, but the second one only consumes what the actual string size is.

$iterations = 100000;
$a = 'a';
$b = 'b';
echo "start peak memory usage " . (memory_get_peak_usage()/1024).'k<br>';
echo "start current memory usage " . (memory_get_usage()/1024).'k<br>';

for($i = 0; $i<$iterations; $i++) {
    $a = $a . $b;
}
echo "end peak memory usage " . (memory_get_peak_usage()/1024).'k<br>';
echo "end current memory usage " . (memory_get_usage()/1024).'k<br>';

versus:

$iterations = 100000;
$a = 'a';
$b = 'b';
echo "start peak memory usage " . (memory_get_peak_usage()/1024).'k<br>';
echo "start current memory usage " . (memory_get_usage()/1024).'k<br>';

for($i = 0; $i<$iterations; $i++) {
    $a .= $b;
}
echo "end peak memory usage " . (memory_get_peak_usage()/1024).'k<br>';
echo "end current memory usage " . (memory_get_usage()/1024).'k<br>';

So as far as a template engine is concerned, what would be the best way to avoid unnecessary memory consumption? In a development environment it's not a problem, but in production it can become a scalability problem.

Naturally speed is also a concern to me, so the alternative should be about the same speed as this one.

Finally, I think this also has something to do with variable scope. Feel free to correct me, as I am no pro. My understanding is that variables are "unset" by the PHP garbage collector(?) when a function or method ends, but in my case the $page we are working on naturally exists for the whole duration of the script as it is a class variable, and is accessed $this->page, and thus the old instances can't be "unset".

EDIT 16.10.2014: To followup on this question, i did some testing, and am leaning towards the solution mentioned of exploding the page into parts. Here is a rough, simple sketch of the structure, followed by an explanation downwards.

class PageObjectX {
    $_parent;
    __constructor(&$parent) { $this->_parent = $parent; }
    /* has a __toString() method, handles how the variable/section is outputted. */
}

class Page {
    $_parts;
    $_source_parts;
    $_variables;

    public function __constructor($s) {
        $this->_source_parts = preg_split($s, ...);
        foreach($this->_source_parts as $part) {
            $this->_parts[] = new PageObject($this, ...); }
    }

    public function ___toString() { return implode('', $this->_parts); }

    public function setVariables($k, $v) { $this->_variables[$k] = $v; }
}

What i do is explode the template string into an array of parts. Regular strings, variables, strings to get from the database, and regions/sections. The parts array management is encapsulated in the Page class. The array has objects as elements: PageVariable, PageString, PageRepeatable, PagePlaintext. Each object provides a toString() -method, which allows the different types of parts to control how they are displayed, and helps to keep the classes rather small and manageable. Feels "clean" to me in a way.

Each PageN -class gets it's data from the main class by a reference to it's parent. so all global variables are set to the Page class, and the page class handles making the single query to the database to get all translated strings and so on.

Repeatables are probably not straight forward. I am using a repeatable to display lists or something that can be repeated several times, like news items. The content changes, the structure doesnt. So i pass the following array to Page, and when the repeatable names 'news' looks for it's data it gets data for two news items for example.

$regions['news'][0]['news title'] = 'Todays news';
$regions['news'][0]['news desc'] = 'The united nations...';
$regions['news'][1]['news title'] = 'Yesterdays news';
$regions['news'][1]['news desc'] = 'Meanwhile in Afghanistan the rebels...';

If a page element has no data, it is easy to just exclude it in the __toString(). This reduces the need for cleanup for unused parts in a template.

The overall performance of this approach seems pretty good. Memory consumption is about half in initial comparisons. 2M vs 4M. I'm ecxpecting it to be at a way better ratio in big pages as the test page is quite simple. The speed gain is quite remarkable compared to the string version where cleanup takes up quite a bit of juice. 0.1s vs. 0.6s on the string version.

Ill post an update for the final results, but this is what i have this far. Hope this helps those who stumble on this page from google ;)

Upvotes: 3

Views: 157

Answers (2)

hellcode
hellcode

Reputation: 2698

Which system do you use? For me it's not such a huge difference:
In a plain script:
peak 325.1k, curr 218.7k vs. peak 219.6k, curr 218.7k
In a function of a class:
peak 327.2k, curr 220.8k vs. peak 221.8k, curr 220.8k

I would expect the difference in peak could come from the last operation, where $a is concatenated and the old value of $a is still in use. This would explain nearly 100k in peak.

Upvotes: 0

Mr. Llama
Mr. Llama

Reputation: 20889

In your specific example ($page = str_replace($find, $replace, $page);) it won't be possible to avoid making a copy of $page. This applies to all functions (string related or not) that require parameters to be passed by value. However, PHP's garbage collection should free up those unused copies at regular intervals.

If you're still experiencing excessive memory usage, I would strongly recommend you check your code. Make sure that variables have a clearly defined scope and that only required data is stored. There are tools available to help diagnose PHP memory usage, such as php-memprof.

In addition, I would also verify that you're using the latest available versions of PHP as garbage collection is continuously improved upon.

Upvotes: 2

Related Questions