PDFBox ERROR Could not load font file

Question

I am unsure if it is a PDFBox issue. But mentioning it might help understand my issue.

So I have been getting a lot of these warnings coming from PDFBox:

WARN  No Unicode mapping for a37 (37) in font TCBLZV+LCIRCLE10

This is one out of 100s.

So I decided to add the LCRICLE10 font and other fonts that are mentioned in the warning list.

Here are the fonts I downloaded:

Here is the PDFBox error I am getting:

5517 ERROR Could not load font file: /home/$USER/.fonts/bakoma/pfb/eurb9.pfb
5518 java.io.IOException: Found Token[kind=NAME, text=dup] but expected INTEGER
5519    at org.apache.fontbox.type1.Type1Parser.read(Type1Parser.java:812)
5520    at org.apache.fontbox.type1.Type1Parser.readEncoding(Type1Parser.java:226)
5521    at org.apache.fontbox.type1.Type1Parser.parseASCII(Type1Parser.java:135)
5522    at org.apache.fontbox.type1.Type1Parser.parse(Type1Parser.java:61)
5523    at org.apache.fontbox.type1.Type1Font.createWithPFB(Type1Font.java:56)
5524    at org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.addType1Font(FileSystemFontProvider.java:646)
5525    at org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.scanFonts(FileSystemFontProvider.java:255)
5526    at org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.(FileSystemFontProvider.java:225)
5527    at org.apache.pdfbox.pdmodel.font.FontMapperImpl$DefaultFontProvider.(FontMapperImpl.java:130)
5528    at org.apache.pdfbox.pdmodel.font.FontMapperImpl.getProvider(FontMapperImpl.java:149)
5529    at org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFont(FontMapperImpl.java:413)
5530    at org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFontBoxFont(FontMapperImpl.java:376)
5531    at org.apache.pdfbox.pdmodel.font.FontMapperImpl.getFontBoxFont(FontMapperImpl.java:350)
5532    at org.apache.pdfbox.pdmodel.font.PDType1Font.(PDType1Font.java:146)
5533    at org.apache.pdfbox.pdmodel.font.PDType1Font.(PDType1Font.java:79)
5534    at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:62)
5535    at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)
5536    at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60)
5537    at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)
5538    at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)
5539    at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
5540    at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
5541    at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
5542    at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
5543    at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)
5544    at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
5545    at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
5546    at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)
5547    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:168)
5548    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
5549    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
5550    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
5551    at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:205)
5552    at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:486)
5553    at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)

It's one out of many others.

Here is small list:

ERROR Could not load font file: /home/$USER/.fonts/bakoma/pfb/eufm6.pfb
ERROR Could not load font file: /home/$USER/.fonts/bakoma/pfb/euex9.pfb
ERROR Could not load font file: /home/$USER/.fonts/bakoma/pfb/eusm10.pfb
ERROR Could not load font file: /home/$USER/.fonts/bakoma/pfb/cmmi7.pfb
ERROR Could not load font file: /home/$USER/.fonts/bakoma/pfb/msam6.pfb

They seem to all come from: .fonts/bakoma/pfb/

When I went on FireFox I saw this:

I removed the fonts from ~/.fonts/ and clear the font cache and now everything is back to normal.

Tilman Hausherr · Accepted Answer

The "WARN No Unicode mapping" messages are only relevant if you do text extraction, i.e. your text will be nothing for that glyph because the unicode mapping is missing. "TCBLZV+LCIRCLE10" indicates an embedded font subset, so adding fonts won't help anyway. See also this: https://pdfbox.apache.org/2.0/faq.html#notext

So your real question ends there, it doesn't get better by loading fonts, unless you'd have trouble with non-embedded fonts.

The error "Found Token[kind=NAME, text=dup] but expected INTEGER" indicates an error parsing a type 1 font. This can be a syntax error in the font, or a bug in the PDFBox type 1 font parser. I rather suspect the later, because type 1 fonts are based on PostScript and the PDFBox parser can recognize only a subset of it.

Update: I looked at the eurb9.pfb font. It is like I suspected, the ASCII part of the font has a calculation ("dup dup 161 10 getinterval 0 exch putinterval dup dup 173 23 getinterval 10 exch putinterval dup dup 127 exch 196 get put readonly def") and we can't parse it. Our own type 1 parser can only parse elements that don't calculate. (This still covers 99% of type1 fonts)

PDFBox ERROR Could not load font file

Answers (1)

Related Questions