Reputation: 3025
When I print a PDF-file with a PS-driver and then convert the PS-file to a searchable PDF with ghostscript (pdfwrite device) something is wrong with the final pdf file. It becomes corrupt. In some cases, the space character disappears, and in other cases the text width becomes too large so text overlap text.
The settings for gs is -dNOPAUSE -dBatch -sDEVICE=pdfwrite -dEmbedAllFonts=true -dSubsetFonts=false -sOutputFile=output.pdf input.ps
I am wondering if it is ghostscript that just cant produce a good output when the input file is a pdf. If I print a word-document everything works fine!
Are there any other solutions like using a xps-driver and convert the xps file to a searchable pdf instead? are there any solutions out there that can do this?
I use gs 9.07.
Best regards Joe
Upvotes: 0
Views: 1580
Reputation: 31139
Why are you going through the step of printing the PDF file to a PostScript file? Ghostscript is already capable of accepting a PDF file as input.
This simply adds more confusion, it certainly won't add anything useful.
Its not possible to say what the problem 'might' be without seeing the original PDF file and the PostScript file produced by your driver. My guess would be that whatever application is processing the PDF hasn't embedded the font, or that the PostScript driver hasn't been able to convert the font into something suitable for PostScript, resulting in the font being missing in the output, and the pdfwrite device having to substitute 'something else' for the missing font.
Ghostscript (more accurately the pdfwrite device) is perfectly capable of producing a decent PDF file when the input is PDF, but your input isn't PDF, its PostScript!
To be perfectly honest, if your original PDF file isn't 'searchable' its very unlikely that the PDF file produced by pdfwrite will be either, no matter whether you use the original PDF or mangle it into PostScript instead.
The usual reasons why a PDF file are not 'searchable' are because there is no ToUnicode information and the font is encoded with a custom encoding and deos not use standard glyph names. If this is the case there is nothing you can do with the PDF file except OCR it.
Upvotes: 3