user3423568
user3423568

Reputation: 811

Convert PDF files to images with PDFBox

Can someone give me an example on how to use Apache PDFBox to convert a PDF file in different images (one for each page of the PDF)?

Upvotes: 78

Views: 100222

Answers (7)

Tilman Hausherr
Tilman Hausherr

Reputation: 18861

Solution for 1.8.* versions:

PDDocument document = PDDocument.loadNonSeq(new File(pdfFilename), null);
List<PDPage> pdPages = document.getDocumentCatalog().getAllPages();
int page = 0;
for (PDPage pdPage : pdPages)
{ 
    ++page;
    BufferedImage bim = pdPage.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
    ImageIOUtil.writeImage(bim, pdfFilename + "-" + page + ".png", 300);
}
document.close();

Don't forget to read the 1.8 dependencies page before doing your build.

Solution for the 2.0 version:

PDDocument document = PDDocument.load(new File(pdfFilename));
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int page = 0; page < document.getNumberOfPages(); ++page)
{ 
    BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);

    // suffix in filename will be used as the file format
    ImageIOUtil.writeImage(bim, pdfFilename + "-" + (page+1) + ".png", 300);
}
document.close();

Solution for the 3.0 versions:

PDDocument document = Loader.loadPDF(new File(pdfFilename));

(the rest is like in 2.0)

The ImageIOUtil class is in a separate download / artifact (pdf-tools). Read the 2.0 dependencies page before doing your build, you'll need extra jar files for PDFs with jbig2 images, for saving to tiff images, and reading of encrypted files.

Make sure you have logging enabled and are using the latest version of whatever JDK version prefer, i.e. if you are using jdk8, then don't use version 1.8.0_5, use 1.8.0_391 or whatever is the latest at the time you're reading this text. Early jdk versions were very slow.

Upvotes: 138

user1409784
user1409784

Reputation:

Just adding the following snippet for the new Apache pdfbox version 3 (3.0.0-RC1)

        try(PDDocument pddDoc =  Loader.loadPDF(docFile) ){
            PDFRenderer pr = new PDFRenderer (pddDoc );
            BufferedImage backImage = pr.renderImage(0);
        } catch (IOException  e) {
            e.printStackTrace();
        }

Notes

  • version 3 (3.0.0-RC1) works with GraalVM and the new just released Liberica Native Image Kit allows for awt on Linux/Windows/Mac
  • PDDocument.load ect replaced by new org.apache.pdfbox.Loader class

Upvotes: 2

Sarie Chafiq
Sarie Chafiq

Reputation: 21

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.nio.file.Path;

public class Pdf2Image {

    public String convertPdf2Img(String fileInput, Path path) {
        String destDir = "";
        try {
            String destinationDir = path.toString();
            File sourceFile = new File(fileInput);
            File destinationFile = new File(destinationDir);

            if (!destinationFile.exists()) {
                destinationFile.mkdir();
                System.out.println("Folder Created -> " + destinationFile.getAbsolutePath());
            }

            if (sourceFile.exists()) {
                PDDocument document = PDDocument.load(sourceFile);
                PDFRenderer pdfRenderer = new PDFRenderer(document);

                String fileName = sourceFile.getName().replace(".pdf", "");

                // int pageNumber = 0;

                // for (PDPage page : document.getPages()) {
                for (int pageNumber = 0; pageNumber < document.getNumberOfPages(); ++pageNumber) {
                    BufferedImage bim = pdfRenderer.renderImage(pageNumber);

                    destDir = destinationDir + File.separator + fileName + "_" + pageNumber + ".png";

                    ImageIO.write(bim, "png", new File(destDir));
                }

                document.close();

                System.out.println("Image saved at -> " + destinationFile.getAbsolutePath());
            } else {
                System.err.println(sourceFile.getName() + " File does not exist");
            }
        } catch (Exception e) {
            e.printStackTrace();
        }

        return destDir;
    }

}

Upvotes: 1

Rodolfo Velasco
Rodolfo Velasco

Reputation: 855

Here is part of my code to convert a pdf, from a multipart file, to jpg thumbnail. I'm saving the image as a base64 string. Pdfbox 2.0.21 version was used.

private static String generatePdfThumbnail(byte[] imageInBytesArray) throws IOException {
    PDDocument document = PDDocument.load(imageInBytesArray);
    PDFRenderer renderer = new PDFRenderer(document);
    BufferedImage bufferedImage = renderer.renderImage(0);
    Graphics2D bufImageGraphics = bufferedImage.createGraphics();
    bufImageGraphics.drawImage(bufferedImage, 0, 0, null);

    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    boolean foundWriter = ImageIO.write(bufferedImage, "jpg", baos);
    byte[] fileContent = null;
    if (!foundWriter) {
      return "";
    }

    fileContent = baos.toByteArray();
    return Base64.getEncoder().encodeToString(fileContent);
  }

Upvotes: 0

chris01
chris01

Reputation: 12331

I tried it today with PdfBox 2.0.15.

import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.rendering.*;
import java.awt.image.*;
import java.io.*;
import javax.imageio.*;


public static void PDFtoJPG (String in, String out) throws Exception
{
    PDDocument pd = PDDocument.load (new File (in));
    PDFRenderer pr = new PDFRenderer (pd);
    BufferedImage bi = pr.renderImageWithDPI (0, 300);
    ImageIO.write (bi, "JPEG", new File (out)); 
}

Upvotes: 21

rashedmedisys
rashedmedisys

Reputation: 79

public class PDFtoJPGConverter {

    public List<File> convertPdfToImage(File file, String destination) throws Exception {

    File destinationFile = new File(destination);

    if (!destinationFile.exists()) {
        destinationFile.mkdir();
        System.out.println("DESTINATION FOLDER CREATED -> " + destinationFile.getAbsolutePath());
    }else if(destinationFile.exists()){
        System.out.println("DESTINATION FOLDER ALLREADY CREATED!!!");
    }else{
        System.out.println("DESTINATION FOLDER NOT CREATED!!!");
    }

    if (file.exists()) {
        PDDocument doc = PDDocument.load(file);
        PDFRenderer renderer = new PDFRenderer(doc);
        List<File> fileList = new ArrayList<File>();

        String fileName = file.getName().replace(".pdf", "");
        System.out.println("CONVERTER START.....");

        for (int i = 0; i < doc.getNumberOfPages(); i++) {
        // default image files path: original file path
        // if necessary, file.getParent() + "/" => another path
        File fileTemp = new File(destination + fileName + "_" + i + ".jpg"); // jpg or png
        BufferedImage image = renderer.renderImageWithDPI(i, 200);
        // 200 is sample dots per inch.
        // if necessary, change 200 into another integer.
        ImageIO.write(image, "JPEG", fileTemp); // JPEG or PNG
        fileList.add(fileTemp);
        }
        doc.close();
        System.out.println("CONVERTER STOPTED.....");
        System.out.println("IMAGE SAVED AT -> " + destinationFile.getAbsolutePath());
        return fileList;
    } else {
        System.err.println(file.getName() + " FILE DOES NOT EXIST");
    }
    return null;
    }

    public static void main(String[] args) {

    try {
        PDFtoJPGConverter converter = new PDFtoJPGConverter();
        Scanner sc = new Scanner(System.in);
        System.out.print("Enter your destination folder where save image \n");
        // Destination = D:/PPL/;
        String destination = sc.nextLine();

        System.out.print("Enter your selected pdf files name with source folder \n");
        String sourcePathWithFileName = sc.nextLine();
        // Source Path = D:/PDF/ant.pdf,D:/PDF/abc.pdf,D:/PDF/xyz.pdf
        if (sourcePathWithFileName != null || sourcePathWithFileName != "") {
        String[] files = sourcePathWithFileName.split(",");
        for (String file : files) {
            File pdf = new File(file);
            System.out.print("FILE:>> "+ pdf);
            converter.convertPdfToImage(pdf, destination);
        }
        }

    } catch (Exception ex) {
        ex.printStackTrace();
    }
    }
}

====================================

Here i am use Apache pdfbox-2.0.8 , commons-logging-1.2 and fontbox-2.0.8 Library

HAPPY CODING :)

Upvotes: 4

kittyminky
kittyminky

Reputation: 485

w/o any extra dependencies you can just use the PDFToImage class already included in PDFBox.

Kotlin:

PDFToImage.main(arrayOf<String>("-outputPrefix", "newImgFilenamePrefix", existingPdfFilename))

other config opts: https://pdfbox.apache.org/docs/2.0.8/javadocs/org/apache/pdfbox/tools/PDFToImage.html

Upvotes: 2

Related Questions