Reputation: 241
I'm trying to convert a pdf to png file using pdfbox. Unfortunately in the result I get weird red areas in some places of the output. I'm not sure what's the problem. It's a problem with only some of the pdf files.
Here's some of the code that I'm using:
public static BufferedImage generateFromPdf(String ref, InputStream stream, int pageIndex, PreviewMode mode) throws IOException {
PDDocument doc = null;
try (InputStream buffered = new BufferedInputStream(stream)) {
doc = PDDocument.load(buffered, PDF_LOADING_MEMORY_SETTING);
if (pageIndex > doc.getNumberOfPages()) {
return null;
}
PDFRenderer renderer = new PDFRenderer(doc);
return rasterizePdfBox(ref, pageIndex, renderer, mode);
} finally {
if (doc != null) {
doc.close();
}
}
}
and then:
private static BufferedImage rasterizePdfBox(String ref, int pageIndex, PDFRenderer renderer, PreviewMode mode) throws IOException {
Future<BufferedImage> result = executorService.submit(() -> {
LOGGER.info(String.format("Generate preview for ref: %s, page: %s, mode: %s ", ref, pageIndex, mode.name()));
return renderer.renderImageWithDPI(pageIndex - 1, mode.getDpi(), ImageType.RGB);
});
try {
return result.get();
} catch (InterruptedException | ExecutionException e) {
LOGGER.error(String.format("Error when generating preview: %s", e.getMessage()));
Thread.currentThread().interrupt();
throw new IOException(e.getMessage());
}
}
So far I've only figured out that the places which are red in the output are blank when I open them in Master PDF editor
on linux. They seem normal though when I open them with Document Viewer
.
Some hints:
- the pdfs with problems have been scanned. I can select text around the working parts but not at the places that have red overlay over them. Maybe it's something to do with OCR issues?
- if I use the linux tool convert not-working-pdf.pdf converted.pdf
and then try to convert this file to png, then the issue is not there anymore.
Here's an example file: https://ufile.io/3or9l
pdfbox version: 2.0.13
Upvotes: 2
Views: 299
Reputation: 18906
This was a PDFBox bug and the cause was a bitonal image with a mask, which is unusual. There is only one color element in the raster so only "R" is applied instead of all 3 of the RGB destination. Because of that, white appeared as red.
More details about this bug in issue PDFBOX-4470, it will be fixed in release 2.0.14. Until then, you can work with a snapshot.
Upvotes: 1