Reputation: 1903
I am using iText (5.5.12) PdfSmartCopy
to merge together two files that have embedded, unsubsetted fonts (and happen to be generated on the same machine, so I know they are referring to the same font) in the hope that the final result will have only a single copy of the font.
However I am finding that the merged result has the font embedded twice.
String[] srcs = ...
Document document = new Document();
PdfCopy copy = new PdfSmartCopy(document, new FileOutputStream(result));
document.open();
for (int i = 0; i < srcs.length; i++) {
PdfReader reader = new PdfReader(srcs[i]);
copy.addDocument(reader);
copy.freeReader(reader);
reader.close();
}
document.close();
pdffonts
on the relavant files:Input file 1:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
TimesNewRomanPSMT CID TrueType Identity-H yes no yes 14 0
Input file 2:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
TimesNewRomanPSMT CID TrueType Identity-H yes no yes 11 0
Output file:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
TimesNewRomanPSMT CID TrueType Identity-H yes no yes 3 0
TimesNewRomanPSMT CID TrueType Identity-H yes no yes 25 0
Upvotes: 1
Views: 506
Reputation: 95918
In contrast to your assumption to have
two files that have embedded, unsubsetted fonts
the fonts are subsetted, and differently so.
From file1.pdf:
From file2.pdf:
As you can see there are numerous differences, there is a non-empty glyph for "1" in file 1 but not in file 2, vice versa for "2", etc...
Thus, these fonts are not identical and PdfSmartCopy
correctly did not replace one by the other.
I assume that pdffonts
did not recognize them as subsetted because they are not properly marked as subset fonts, in particular their names don't have the required subset tags and they don't have the optional CharSet listing of the character names defined in a font subset. Thus, the fonts not merely are not unsubsetted, the subsetting also was done incorrectly.
Thus, don't blame pdffonts
for your incorrect assumptions but instead the PDF generator which created the input files.
Upvotes: 1