Parsing PDF containing special Fonts in php

Question

I am using smalot/pdf-parser in a Zendframework 3 project to get the content of PDF files. Until now the used files were PDF 1.3 files with a basic font. But the source the files are created from will udpate soon and the files will be PDF 1.5 with a few specific fonts.

When trying to parse the new files i get this error:

Object list not found. Possible secured file.

I tried to convert to a lower PDF Version and could parse the file. But i get errors in the special characters from the special fonts used and since we get alot of those PDFs, converting each file and then upload it into our system is not a viable option.

I also tried to install the used fonts into the tcpdf library. The error remains.

When creating a PDF 1.5 with a basic font i can read the file so im fairly sure the error could be solved by properly using the right fonts or converting the fonts in the pdf.

I found this issue with 2 possible solutions. First someone mentioned to install the font into the tcpdf package, done but didnt work - although im not 100% sure that i got all fonts.. is there a way to debug this with tcpdf ?

Secondly someone mentioned

I changed the code for the escaping sequences I was interested in.

which solved the issue for him. But i dont know how to do that.

Lapskaus · Accepted Answer

I ended up using ghostscript to convert the PDFs

$cmd = 'gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dBATCH -dNOPAUSE -sOutputFile=' . $outputFile . ' ' . $inputFile;        
exec($cmd);

Parsing PDF containing special Fonts in php

Answers (1)

Related Questions