Reputation: 13
PDFBox renderImageWithDPI only partially renders text because of missing embedded(?) fonts.
Using PDFBox 2.0.28 then tried PDFBox 3.0.0-RC1
Created a PDDocument using Loader.loadPDF
Created a PDFRenderer from the PDDocument
Executed renderImageWithDPI(pagenum, dpi, RGBObj) on PDDocument
Obtained java.awt.image.BufferedImage
Write as jpg using javax.imageio.ImageIO
However, there is missing content in the images
Extracted 2 sample problematic pages from the pdf using PDFSam basic
Have highlighted areas where the content is missing.
On executing PreflightParser.validate obtain the messages below:-
1.4 : Trailer Syntax error, /XRef cross reference streams are not allowed
5.2.2 : Forbidden field in an annotation definition, Flags of Link annotation are invalid
2.3.2 : Unexpected value for key in Graphic object definition, Unexpected 'true' value for 'Interpolate' Key
2.4.2 : Invalid Color space, The operator "k" can't be used with RGB Profile
2.4.3 : Invalid Color space, The operator "f" can't be used without Color Profile
3.1.4 : Invalid Font definition, ELWKFI+OptimaLTStd: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, JECWGC+InsigniaLTStd: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, PHSMMZ+OptimaLTStd-Bold: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, EHCNNL+OptimaLTStd-Italic: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, QBVSKF+HelveticaLTStd-Obl: The Charset entry is missing for the Type1 Subset
3.1.9 : Invalid Font definition, UBAPGG+OptimaLTStd: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, UBAPGG+OptimaLTStd: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, UBAPGG+OptimaLTStd: The FontFile can't be read
3.1.9 : Invalid Font definition, ORMCFE+HelveticaLTStd-Obl: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, ORMCFE+HelveticaLTStd-Obl: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, ORMCFE+HelveticaLTStd-Obl: The FontFile can't be read
3.1.9 : Invalid Font definition, TFEWKU+HelveticaLTStd-Roman: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, TFEWKU+HelveticaLTStd-Roman: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, TFEWKU+HelveticaLTStd-Roman: The FontFile can't be read
3.1.4 : Invalid Font definition, CRQQXS+OptimaLTStd: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, MVVAWX+InsigniaLTStd: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, YIWFBD+OptimaLTStd-Bold: The Charset entry is missing for the Type1 Subset
3.1.11 : Invalid Font definition, JYHLHF+OptimaLTStd: The CIDSet entry is missing for the Composite Subset
3.1.9 : Invalid Font definition, LDXBBC+OptimaLTStd: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, LDXBBC+OptimaLTStd: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, LDXBBC+OptimaLTStd: The FontFile can't be read
3.1.9 : Invalid Font definition, FSNSYC+OptimaLTStd: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, FSNSYC+OptimaLTStd: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, FSNSYC+OptimaLTStd: The FontFile can't be read
3.1.9 : Invalid Font definition, LVYKUL+InsigniaLTStd: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, LVYKUL+InsigniaLTStd: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, LVYKUL+InsigniaLTStd: The FontFile can't be read
3.1.9 : Invalid Font definition, FUYTUP+OptimaLTStd-Italic: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, FUYTUP+OptimaLTStd-Italic: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, FUYTUP+OptimaLTStd-Italic: The FontFile can't be read
3.1.9 : Invalid Font definition, GZVYQO+OptimaLTStd-Bold: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, GZVYQO+OptimaLTStd-Bold: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, GZVYQO+OptimaLTStd-Bold: The FontFile can't be read
3.1.9 : Invalid Font definition, GWNIWZ+HelveticaLTStd-Roman: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, GWNIWZ+HelveticaLTStd-Roman: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, GWNIWZ+HelveticaLTStd-Roman: The FontFile can't be read
7.1 : Error on MetaData, Metadata is not a stream
Which also corroborate to execution warnings
May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font GWNIWZ+HelveticaLTStd-Roman
java.io.IOException: head is mandatory
at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:182)
at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:150)
at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:79)
at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:27)
at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:106)
at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:73)
at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.<init>(PDCIDFontType2.java:114)
at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.<init>(PDCIDFontType2.java:67)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:138)
at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:88)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:96)
at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)
at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:849)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:142)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:264)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:338)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:259)
at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:245)
Additional truncated messages
May 26, 2023 12:40:00 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font UBAPGG+OptimaLTStd
java.io.IOException: head is mandatory
May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font GZVYQO+OptimaLTStd-Bold
java.io.IOException: head is mandatory
May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font FUYTUP+OptimaLTStd-Italic
java.io.IOException: head is mandatory
May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font FSNSYC+OptimaLTStd
java.io.IOException: head is mandatory
Although fallback fonts seen to be used they don't work either.
May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 findFontOrSubstitute WARNING: Using fallback font LiberationSans for CID-keyed TrueType font GWNIWZ+HelveticaLTStd-Roman
I also see warning messages as below, unsure how to process / address.
May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.graphics.color.PDICCBased ensureDisplayProfile WARNING: ICC profile is Perceptual, ignoring, treating as Display class
Need multiple assistance.
Question 1: How do I add a font?
int position = 0;
PDPage page = getDocument().getPage(position);
PDResources resources = page.getResources();
OTFParser otfParser = new OTFParser();
OpenTypeFont otf = otfParser.parse(new File("OptimaLTStd.otf"));
PDFont font = PDType0Font.load(document, otf, false);
resources.add(font);
page.setResources(resources);
if (position == 0) {
getDocument().getPages().remove(page);
getDocument().getPages().add(page);
setDocument(getDocument());
setPdfRenderer(getDocument());
} else {
PDPage prevPage = getDocument().getPage(position - 1);
getDocument().getPages().insertBefore(page, prevPage);
setDocument(getDocument());
setPdfRenderer(getDocument()); }
Question 2: Do we have an override in pdfrender to skip glyph processing so that font related issues do not impact image generation ?
Upvotes: 1
Views: 888
Reputation: 18861
The problem of the missing text is caused by 0 width definitions for the fonts in the PDF, which incorrectly influences a "stretching" algorithm hen rendering. This has been fixed in ticket PDFBOX-5611 and will be in the version 2.0.29. Until then, a snapshot build will be available.
Upvotes: 0