Ojas Pednekar
Ojas Pednekar

Reputation: 269

PDDocument.load(file) isnt a method (PDFBox)

I wanted to make a simple program to get text content from a pdf file through Java. Here is the code:

    PDFTextStripper ts = new PDFTextStripper();
    File file = new File("C:\\Meeting IDs.pdf");
    PDDocument doc1 = PDDocument.load(file);
    String allText = ts.getText(doc1);
    String gradeText = allText.substring(allText.indexOf("GRADE 10B"), allText.indexOf("GRADE 10C"));
    System.out.println("Meeting ID for English: "
            + gradeText.substring(gradeText.indexOf("English") + 7, gradeText.indexOf("English") + 20));

This is just part of the code, but this is the part with the problem. The error is: The method load(File) is undefined for the type PDDocument


I have learnt using PDFBox from JavaTPoint. I have followed the correct instructions for installing the PDFBox libraries and adding them to the Build Path. My PDFBox version is 3.0.0 I have also searched the source files and their methods, and I am unable to find the load method there.

Thank you in advance.

Upvotes: 25

Views: 51171

Answers (1)

sorifiend
sorifiend

Reputation: 6307

As per the 3.0 migration guide the PDDocument.load method has been replaced with the Loader method:

For loading a PDF PDDocument.load has been replaced with the Loader methods. The same is true for loading a FDF document.

When saving a PDF this will now be done in compressed mode per default. To override that use PDDocument.save with CompressParameters.NO_COMPRESSION.

PDFBox now loads a PDF Document incrementally reducing the initial memory footprint. This will also reduce the memory needed to consume a PDF if only certain parts of the PDF are accessed. Note that, due to the nature of PDF, uses such as iterating over all pages, accessing annotations, signing a PDF etc. might still load all parts of the PDF overtime leading to a similar memory consumption as with PDFBox 2.0.

The input file must not be used as output for saving operations. It will corrupt the file and throw an exception as parts of the file are read the first time when saving it.

So you can either swap to an earlier 2.x version of PDFBox, or you need to use the new Loader method. I believe this should work:

File file = new File("C:\\Meeting IDs.pdf");
PDDocument doc1 = Loader.loadPDF(file);

Upvotes: 48

Related Questions