alemao2x
alemao2x

Reputation: 15

ImageMagick with PHP text overflowing PDF to JPG conversion

I'm trying now to convert a PDF file to JPG, using ImageMagick with PHP and CakePHP. The PDF is in perfect shape and it's right the way it should be, but the image generated from the PDF is always overflowing the borders of the file.

Until now, I've tried tweaking the code for the generation with no sucess, reading a lot from the PHP docs (http://php.net/manual/pt_BR/book.imagick.php).

Here are the convertion code:

            $image = new Imagick();
            $image->setResolution(300,300);
            $image->setBackgroundColor('white');
            $image->readImage($workfile);
            $image->setGravity(Imagick::GRAVITY_CENTER);
            $image->setOption('pdf:fit-to-page',true);
            $image->setImageFormat('jpeg');
            $image->setImageCompression(imagick::COMPRESSION_JPEG);
            $image->setImageCompressionQuality(60);
            $image->scaleImage(1200,1200, true);
            $image->mergeImageLayers(Imagick::LAYERMETHOD_FLATTEN);
            $image->setImageAlphaChannel(Imagick::ALPHACHANNEL_REMOVE);
            $image->writeImage(WWW_ROOT . 'files' . DS . 'Snapshots' . DS . $filename);

Here are the results: https://i.sstatic.net/nmSRF.jpg

The first image is the PDF before the conversion and the second one, the image generated from the PDF where the right side text overflows.

So, why this is happening? And if someone got some alternative for any tech used (the GhostScript, ImageMagick, etc) is also welcome!

Thanks everyone!

Upvotes: 1

Views: 1283

Answers (1)

KenS
KenS

Reputation: 31207

Its very hard to say why you see the result you do, without seeing the original PDF file, rather than a picture of it.

The most likely explanation is that your original PDF file uses a font, but does not embed that font in the PDF. When Ghostscript comes to render it to an image it must then substitute 'something' in place of the missing font. If the metrics (eg spacing) of the substituted font do not match precisely the metrics of the missing font, then the rendered text will be misplaced/incorrectly sized. Of course since its not using the same font it also won't match the shapes of the characters either.

This can result in several different kinds of problems, but what you show is pretty typical of one such class of problem. Although you haven't mentioned it, I can also see several places in the document where text overwrites as well, which is another symptom of exactly the same problem.

If this is the case then the Ghostscript back channel transcript will have told you that it was unable to find a font and is substituting a named font for the missing one. I can't tell you if Imagemagick stores that anywhere, my guess would be it doesn't. However you can copy the command line from the ImagMagick profile.xml file and then use that to run Ghostscript yourself, and then you will be able to see if that's what is happening.

If this is what is happening then you must either;

  1. Create your PDF file with the fonts embedded (this is good practice anyway)
  2. Supply Ghostscript with a copy of the missing font as a substitute
  3. Live with the text as it is

Upvotes: 1

Related Questions