Reputation: 32272
We're having some trouble in our application with people pasting images into our rich-text WYSIWYG, at which point they exist as base64-encoded strings. eg:
<img src="data:image/png;base64,iVBORw..." />
The submission form is submitted and processed just fine, but when our application is generating a page containing multiple images it can cause PHP to hit its memory limit, as well as bloating page source, etc.
What I've done is written some code to add to our form processor to extract the embedded images, write them to a file, and then put the URL in the src
attribute. The problem is that while processing an image memory usage spikes to 4x the size of the data which could potentially break the form processor as well.
<?php
function profile($label) {
printf("%10s %11d %11d\n", $label, memory_get_usage(), memory_get_peak_usage());
}
function handleEmbedded(&$src) {
$dom = new DOMDocument;
$dom->loadHTML($src);
profile('domload');
$images = $dom->getElementsByTagName('img');
profile('getimgs');
foreach ($images as $image) {
if( strpos($image->getAttribute('src'), 'data:') === 0 ) {
$image->setAttribute('src', saneImage($image->getAttribute('src')));
}
}
profile('presave');
$src = $dom->saveHTML();
profile('postsave');
}
function saneImage($data) {
$type = explode('/', substr($data, 5, strpos($data, ';')-5))[1];
$filename = generateFilename('./', 'data_', $type);
//file_put_contents($filename, base64_decode(substr($data, strpos($data, ';')+8)));
$fh = fopen($filename, 'w');
stream_filter_append($fh, 'convert.base64-decode');
fwrite($fh, substr($data, strpos($data, ';')+8));
fclose($fh);
profile('filesaved');
return $filename;
}
function generateFilename($dir, $prefix, $suffix) {
$dir = preg_replace('@/$@', '', $dir);
do {
$filename = sprintf("%s/%s%s.%s", $dir, $prefix, md5(mt_rand()), $suffix);
} while( file_exists($filename) );
return "foo.$suffix";
return $filename;
}
profile('start');
$src = file_get_contents('derp.txt');
profile('load');
handleEmbedded($src);
profile('end');
start 236296 243048
load 1306264 1325312
domload 1306640 2378768
getimgs 1306880 2378768
filesaved 2371080 4501168
presave 1307264 4501168
postsave 244152 4501168
end 243480 4501168
As you can see the memory usage still jumps into the 4MB range while the file is saved, despite trying to shave bytes by using a stream filter. I think that there's some buffering happening in the background, and if I was simply transcribing between files I'd break the data into chunks, but I don't know if that is feasible/advisable in this case.
Is there anywhere I might be able to pare down my memory usage?
file_put_contents()
and changing handleEmbedded()
to not pass by reference have the same memory usage.derp.txt
contains a snippet of HTML with a single base64-encoded image.:I
Upvotes: 4
Views: 1375
Reputation: 32272
Props to Norbert for punching a hole in my mental block:
function saneImage($data) {
$type = explode('/', substr($data, 5, strpos($data, ';')-5))[1];
$filename = generateFilename('./', 'data_', $type);
writefile($filename, $data);
profile('filesaved');
return $filename;
}
function writefile($filename, $data) {
$fh = fopen($filename, 'w');
stream_filter_append($fh, 'convert.base64-decode');
$chunksize=12*1024;
$offset = strpos($data, ';')+8;
for( $i=0; $chunk=substr($data,($chunksize*$i)+$offset,$chunksize); $i++ ) {
fwrite($fh, $chunk);
}
fclose($fh);
}
Output:
start 237952 244672
load 1307920 1327000
domload 1308296 2380664
getimgs 1308536 2380664
filesaved 2372712 2400592
presave 1308944 2400592
postsave 245832 2400592
end 245160 2400592
Upvotes: 1