Reputation: 3833

Lucene for Java with PDFBox getting a nullpointer exception

I'm frustrated with the PDFBox API.

I have done:

PDDocument pdfDocument = PDDocument.load(new File("text.pdf"));
PDFTextStripper stripper = new PDFTextStripper();
String s =  stripper.getText(pdfDocument);
pdfDocument.close();

but I'm getting a

Exception in thread "main" java.lang.NullPointerException
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)
at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
at lucene.test.main(test.java:47)

String s =  stripper.getText(pdfDocument);

I have absolutely no idea why. Creating a PDF with the tutorial works great (http://pdfbox.apache.org/cookbook/textextraction.html). But this Text extraction does not. Already searched a lot but nothing helped.

Btw I still work with the "pdfbox-0.7.3.jar" because the new "pdfbox-1.8.2.jar" didn't work for me. Could this be the reason?

Thx for help.

PS: I'm getting the same error when using "stripper.writeText()"

Upvotes: 0

Answers (4)

kwoxer

Reputation: 3833

Instead of

PDDocument pdfDocument = PDDocument.load(new File("text.pdf"));

just use

PDDocument pdfDocument = PDDocument.load("C:\TEMP\text.pdf");

I'm not sure why but it works for me now. Even with the old 0.7.3 of PDFBox.

Upvotes: 2

user11367081

Reputation: 11

Add below external Jars:

pdfbox-1.3.1
commons-logging-1.2

Java Code:

import org.apache.pdfbox.multipdf.Splitter;   
import org.apache.pdfbox.pdmodel.PDDocument;  
import java.io.File;   
import java.io.IOException;   
import java.util.List;   
import java.util.Iterator;  

public class PdfSplitting {  

    public static void main(String[] args)throws IOException {  

          File file = new File("D:/test.pdf");  
          PDDocument document = PDDocument.load(file);   

          Splitter splitter = new Splitter();  

          List<PDDocument>Pages = splitter.split(document);  

          Iterator<PDDocument>iterator = Pages.listIterator();  

    int i = 1;  
    while(iterator.hasNext()) {  
             PDDocument pd = iterator.next();  
    pd.save("D:/test"+ i++ +".pdf");  
          }  
          System.out.println("Pdf spitted successfully");  
    document.close();  
    }  
}

Upvotes: 1

user3815093

Reputation: 9

for this always use For this always use pdfbox 1.8.6 and fop0.93

PDDocument doc = null; try { doc = new PDDocument(); PDPage page = new PDPage(); doc.addPage(page); PDPageContentStream contentStream = new PDPageContentStream(doc, page);

        PDFont pdfFont = PDType1Font.HELVETICA;
        float fontSize = 25;
        float leading = 1.5f * fontSize;

        PDRectangle mediabox = page.findMediaBox();
        float margin = 72;
        float width = mediabox.getWidth() - 2*margin;
        float startX = mediabox.getLowerLeftX() + margin;
        float startY = mediabox.getUpperRightY() - margin;

        String text = "Hello sir finally PDF is created : thanks"; 
        List<String> lines = new ArrayList<String>();
        int lastSpace = -1;
        while (text.length() > 0)
        {
            int spaceIndex = text.indexOf(' ', lastSpace + 1);
            if (spaceIndex < 0)
            {
                lines.add(text);
                text = "";
            }
            else
            {
                String subString = text.substring(0, spaceIndex);
                float size = fontSize * pdfFont.getStringWidth(subString) / 1000;
                if (size > width)
                {
                    if (lastSpace < 0) // So we have a word longer than the line... draw it anyways
                        lastSpace = spaceIndex;
                    subString = text.substring(0, lastSpace);
                    lines.add(subString);
                    text = text.substring(lastSpace).trim();
                    lastSpace = -1;
                }
                else
                {
                    lastSpace = spaceIndex;
                }
            }
        }

        contentStream.beginText();
        contentStream.setFont(pdfFont, fontSize);
        contentStream.moveTextPositionByAmount(startX, startY);            
        for (String line: lines)
        {
            contentStream.drawString(line);
            contentStream.moveTextPositionByAmount(0, -leading);
        }
        contentStream.endText(); 
        contentStream.close();

         doc.save("E:\\document.pdf");
    }catch (Exception exp){
        logger.error("[GetInformation] email id is " +exp);

    }
    finally
    {
        if (doc != null)
        {
            try{
            doc.close();
            }catch (Exception expe){
                logger.error("[GetInformation] email id is " +expe);

            }
        }
    }

Upvotes: 0

Maximin

Reputation: 1685

The problem is with this line

PDDocument pdfDocument = PDDocument.load(new File("text.pdf"));

Specify the path for text.pdf there, ie along with the path.

Without knowing where the file resides how is the JVM supposed to create the file object, that is why the Exception occurs. Give the path over there, then you are good to go.

Update

It seems as a bug and has been fixed in later versions.

Upvotes: 0

Lucene for Java with PDFBox getting a nullpointer exception

Answers (4)

Related Questions