Sammitch
Sammitch

Reputation: 32272

Memory-efficient base64 decoding

We're having some trouble in our application with people pasting images into our rich-text WYSIWYG, at which point they exist as base64-encoded strings. eg:

<img src="data:image/png;base64,iVBORw..." />

The submission form is submitted and processed just fine, but when our application is generating a page containing multiple images it can cause PHP to hit its memory limit, as well as bloating page source, etc.

What I've done is written some code to add to our form processor to extract the embedded images, write them to a file, and then put the URL in the src attribute. The problem is that while processing an image memory usage spikes to 4x the size of the data which could potentially break the form processor as well.

My POC code:

<?php
function profile($label) {
    printf("%10s %11d %11d\n", $label, memory_get_usage(), memory_get_peak_usage());
}

function handleEmbedded(&$src) {
    $dom = new DOMDocument;
    $dom->loadHTML($src);
    profile('domload');
    $images = $dom->getElementsByTagName('img');
    profile('getimgs');
    foreach ($images as $image) {
        if( strpos($image->getAttribute('src'), 'data:') === 0 ) {
            $image->setAttribute('src', saneImage($image->getAttribute('src')));
        }
    }
    profile('presave');
    $src = $dom->saveHTML();
    profile('postsave');
}

function saneImage($data) {
    $type = explode('/', substr($data, 5, strpos($data, ';')-5))[1];
    $filename = generateFilename('./', 'data_', $type);
    //file_put_contents($filename, base64_decode(substr($data, strpos($data, ';')+8)));
    $fh = fopen($filename, 'w');
    stream_filter_append($fh, 'convert.base64-decode');
    fwrite($fh, substr($data, strpos($data, ';')+8));
    fclose($fh);
    profile('filesaved');
    return $filename;
}

function generateFilename($dir, $prefix, $suffix) {
    $dir = preg_replace('@/$@', '', $dir);
    do {
        $filename = sprintf("%s/%s%s.%s", $dir, $prefix, md5(mt_rand()), $suffix);
    } while( file_exists($filename) );
    return "foo.$suffix";
    return $filename;
}

profile('start');
$src = file_get_contents('derp.txt');
profile('load');
handleEmbedded($src);
profile('end');

Output:

     start      236296      243048
      load     1306264     1325312
   domload     1306640     2378768
   getimgs     1306880     2378768
 filesaved     2371080     4501168
   presave     1307264     4501168
  postsave      244152     4501168
       end      243480     4501168

As you can see the memory usage still jumps into the 4MB range while the file is saved, despite trying to shave bytes by using a stream filter. I think that there's some buffering happening in the background, and if I was simply transcribing between files I'd break the data into chunks, but I don't know if that is feasible/advisable in this case.

Is there anywhere I might be able to pare down my memory usage?


Notes:

Upvotes: 4

Views: 1375

Answers (1)

Sammitch
Sammitch

Reputation: 32272

Props to Norbert for punching a hole in my mental block:

function saneImage($data) {
    $type = explode('/', substr($data, 5, strpos($data, ';')-5))[1];
    $filename = generateFilename('./', 'data_', $type);
    writefile($filename, $data);
    profile('filesaved');
    return $filename;
}

function writefile($filename, $data) {
    $fh = fopen($filename, 'w');
    stream_filter_append($fh, 'convert.base64-decode');
    $chunksize=12*1024;
    $offset = strpos($data, ';')+8;
    for( $i=0; $chunk=substr($data,($chunksize*$i)+$offset,$chunksize); $i++ ) {
        fwrite($fh, $chunk);
    }
    fclose($fh);
}

Output:

     start      237952      244672
      load     1307920     1327000
   domload     1308296     2380664
   getimgs     1308536     2380664
 filesaved     2372712     2400592
   presave     1308944     2400592
  postsave      245832     2400592
       end      245160     2400592

Upvotes: 1

Related Questions