user2771609
user2771609

Reputation: 1903

iText PdfSmartCopy is creating duplicate fonts

I am using iText (5.5.12) PdfSmartCopy to merge together two files that have embedded, unsubsetted fonts (and happen to be generated on the same machine, so I know they are referring to the same font) in the hope that the final result will have only a single copy of the font.

However I am finding that the merged result has the font embedded twice.

Here is the code I am using:

String[] srcs = ...
Document document = new Document();
PdfCopy copy = new PdfSmartCopy(document, new FileOutputStream(result));

document.open();
for (int i = 0; i < srcs.length; i++) {
    PdfReader reader = new PdfReader(srcs[i]);
    copy.addDocument(reader);
    copy.freeReader(reader);
    reader.close();
}
document.close();

This is the output of pdffonts on the relavant files:

Input file 1:

name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
TimesNewRomanPSMT                    CID TrueType      Identity-H       yes no  yes     14  0

Input file 2:

name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
TimesNewRomanPSMT                    CID TrueType      Identity-H       yes no  yes     11  0

Output file:

name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
TimesNewRomanPSMT                    CID TrueType      Identity-H       yes no  yes      3  0
TimesNewRomanPSMT                    CID TrueType      Identity-H       yes no  yes     25  0

Upvotes: 1

Views: 506

Answers (1)

mkl
mkl

Reputation: 95918

In contrast to your assumption to have

two files that have embedded, unsubsetted fonts

the fonts are subsetted, and differently so.

From file1.pdf:

from file 1

From file2.pdf:

from file 2

As you can see there are numerous differences, there is a non-empty glyph for "1" in file 1 but not in file 2, vice versa for "2", etc...

Thus, these fonts are not identical and PdfSmartCopy correctly did not replace one by the other.


I assume that pdffonts did not recognize them as subsetted because they are not properly marked as subset fonts, in particular their names don't have the required subset tags and they don't have the optional CharSet listing of the character names defined in a font subset. Thus, the fonts not merely are not unsubsetted, the subsetting also was done incorrectly.

Thus, don't blame pdffonts for your incorrect assumptions but instead the PDF generator which created the input files.

Upvotes: 1

Related Questions