My Head Hurts
My Head Hurts

Reputation: 37665

Accessing font files within PDF

We are currently working with a selection of publishers to generate online books from their PDF's. Our legacy app uses flex, so for this we are converting the PDF to SWF files using PDF2SWF by SWFTools.

The problem that we are having is that the text within the SWF document is not being highlighted by our flex reader when the user performs a search. After a quick investigation we found that when extracting text we need to embed the fonts that are used by the PDF document:

http://wiki.swftools.org/wiki/How_do_I_highlight_text_in_the_SWF%3F

pdf2swf -F $YOUR_FONTS_DIR$ -f input.pdf -o output.swf

As you can see from the code above, we need a path to a font directory containig the fonts found within that PDF.

Since we will be converting a large number of PDF's, is it possible to access the font files directly through the PDF rather than having a lot of fonts stored within our app?

Additional Information

Our app is written in Java.

We are currently using PDFBox and Ghostscript within the app, so if any solutions use these libraries than that would be a preferred option, but we are open to all ideas.

Upvotes: 6

Views: 1474

Answers (1)

KenS
KenS

Reputation: 31141

PDF files don't contain font 'files' they may not even contain any fonts at all, though this is rare. The embedded font data can be in a bewildering variety of formats:

  • type 1 PostScript fonts
  • type 3 PostScript
  • fonts TrueType fonts
  • PostScript CFF fonts
  • CIDFonts with type 1 PostScript outlines
  • CIDFonts with type 3 PostScript outlines
  • CIDFonts with TrueType outlines
  • CIDFonts with CFF outlines
  • CIDFonts with bitmap images

Will your application be able to read all these font formats ? If you want to use them then you must use the fonts embedded in the PDF file as these will very often be subset fonts, and supplied with a custom Encoding, which means that even if you have the original font, you can't use it because the Encoding will not be correct.

Of course it may be that these PDF files are all created in a consistent way and do not use embedded fonts, but I have my doubts....

Upvotes: 7

Related Questions