Reputation: 40624
I am using this command to convert a PDF to a set of JPEG files:
convert -strip -quality 100 -alpha off \
-density 165% -scene 1 tmp3GtW_h.pdf /tmp/a1.jpg
Here is the original PDF:
The font is thinner and more akin to Helvetica.
Here is the outcome:
The font in the output JPEG file is different and thicker.
The convert
command shows this warning:
**** Warning: An error occurred while reading an XREF table.
**** The file has been damaged. This may have been caused
**** by a problem while converting or transfering the file.
**** Ghostscript will attempt to recover the data.
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> Microsoft? PowerPoint? 2013 <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
The version of convert is:
$ convert --version
Version: ImageMagick 6.8.9-7 Q16 x86_64 2014-12-30 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC
Features: DPC OpenMP
Delegates: jng jpeg png x xml zlib
Ghostscript version is:
$ gs --version
9.10
My questions are
1) How can I resolve this issue?
2) How can I tell what font the PDF file is using?
3) How can I tell what fonts are available to convert
and gs
?
EDIT: Found an answer to question 2. Here is the outcome from the pdffonts
command:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
Intro Black Italic Type 1 WinAnsi no no no 145 0
Intro Regular Type 1 WinAnsi no no no 147 0
Intro Black Inline Caps Type 1 WinAnsi no no no 388 0
ABCDEE+Segoe UI TrueType WinAnsi yes yes no 2233 0
ABCDEE+Segoe UI,Italic CID TrueType Identity-H yes yes yes 2607 0
ABCDEE+Segoe UI,Italic TrueType WinAnsi yes yes no 2612 0
Intro Bold Italic Type 1 WinAnsi no no no 3781 0
Upvotes: 4
Views: 3208
Reputation: 90193
If you want to know all relevant details about the fonts used by a PDF document, use
pdffonts the.pdf
You'll see in the column emb
indicated with yes
or no
if a font is embedded.
If a font is NOT embedded, such things will happen as you see: the PDF renderer does not find the font in the file, so it uses a substitution font:
But the document will most likely look different from viewer to viewer, and from system to system. Because each viewer uses a different algorithm to substitute missing fonts.
The pdffonts
command has the -subst
parameter. So
pdffonts -subst the.pdf
will report, what substitution fonts could be possibly be used. Since Poppler, the library pdffonts
is based upon uses FreeType as its font engine, this reported substitution fonts will likely be valid for every viewer that also uses FreeType.
Acrobat for example does NOT use FreeType, but its own font rendering engine. So in Adobe Reader you'll likely get different substitution fonts.
Ghostscript:
The command
gs -h
will report (amongst other things) which directories it will use as its path to search for fonts.
Any Ghostscript command you run can be amended by
-sFONTPATH=/path/to/dir:/path/to/other/dir
to tell Ghostscript to look in other directories for needed fonts for the duration of the current command.
ImageMagick:
This command
convert -list font
will report all fonts which ImageMagick has found on the system.
So very clearly that four different Intro
fonts are not embedded in the PDF. This is a very uncommon font, certainly not in the top 200 used worldwide in PDFs (I should know, because I've harvested 1.000.000 PDFs from the web and am currently creating a statistical database about their various properties -- I don't have a single Intro
in there...).
Whoever created that PDF, or whichever software did so, clearly didn't have much clue about document processing. Because every other system or user or application which has to open, view or process that document will see a very different view of those pages using these fonts from what its creator saw.
In order to process this PDF into images you should not rely on ImageMagick, but run Ghostscript directly:
Intro
fonts are to be found.-sFONTPATH=...
parameter as explained above.Let me re-iterate:
convert
to use any font for rendering the PDF pages to raster images.convert
cannot insert any 'font' into the raster data in the aftermath.convert
can use are only for its own drawing, writing, captioning and annotating operations. -sFONTPATH=...
argument.Intro
font family is. I cannot do that for you, sorry.Running convert -verbose
will give you some insight about how exactly ImageMagick employs Ghostscript as its 'delegate' for PDF input processing, and which command line parameters it uses....
Upvotes: 5