Reputation: 3833
I'm frustrated with the PDFBox API.
I have done:
PDDocument pdfDocument = PDDocument.load(new File("text.pdf"));
PDFTextStripper stripper = new PDFTextStripper();
String s = stripper.getText(pdfDocument);
pdfDocument.close();
but I'm getting a
Exception in thread "main" java.lang.NullPointerException
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)
at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
at lucene.test.main(test.java:47)
at
String s = stripper.getText(pdfDocument);
I have absolutely no idea why. Creating a PDF with the tutorial works great (http://pdfbox.apache.org/cookbook/textextraction.html). But this Text extraction does not. Already searched a lot but nothing helped.
Btw I still work with the "pdfbox-0.7.3.jar" because the new "pdfbox-1.8.2.jar" didn't work for me. Could this be the reason?
Thx for help.
PS: I'm getting the same error when using "stripper.writeText()"
Upvotes: 0
Views: 2564
Reputation: 3833
Instead of
PDDocument pdfDocument = PDDocument.load(new File("text.pdf"));
just use
PDDocument pdfDocument = PDDocument.load("C:\TEMP\text.pdf");
I'm not sure why but it works for me now. Even with the old 0.7.3 of PDFBox.
Upvotes: 2
Reputation: 11
Add below external Jars:
pdfbox-1.3.1
commons-logging-1.2
Java Code:
import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Iterator;
public class PdfSplitting {
public static void main(String[] args)throws IOException {
File file = new File("D:/test.pdf");
PDDocument document = PDDocument.load(file);
Splitter splitter = new Splitter();
List<PDDocument>Pages = splitter.split(document);
Iterator<PDDocument>iterator = Pages.listIterator();
int i = 1;
while(iterator.hasNext()) {
PDDocument pd = iterator.next();
pd.save("D:/test"+ i++ +".pdf");
}
System.out.println("Pdf spitted successfully");
document.close();
}
}
Upvotes: 1
Reputation: 9
for this always use For this always use pdfbox 1.8.6 and fop0.93
PDDocument doc = null; try { doc = new PDDocument(); PDPage page = new PDPage(); doc.addPage(page); PDPageContentStream contentStream = new PDPageContentStream(doc, page);
PDFont pdfFont = PDType1Font.HELVETICA;
float fontSize = 25;
float leading = 1.5f * fontSize;
PDRectangle mediabox = page.findMediaBox();
float margin = 72;
float width = mediabox.getWidth() - 2*margin;
float startX = mediabox.getLowerLeftX() + margin;
float startY = mediabox.getUpperRightY() - margin;
String text = "Hello sir finally PDF is created : thanks";
List<String> lines = new ArrayList<String>();
int lastSpace = -1;
while (text.length() > 0)
{
int spaceIndex = text.indexOf(' ', lastSpace + 1);
if (spaceIndex < 0)
{
lines.add(text);
text = "";
}
else
{
String subString = text.substring(0, spaceIndex);
float size = fontSize * pdfFont.getStringWidth(subString) / 1000;
if (size > width)
{
if (lastSpace < 0) // So we have a word longer than the line... draw it anyways
lastSpace = spaceIndex;
subString = text.substring(0, lastSpace);
lines.add(subString);
text = text.substring(lastSpace).trim();
lastSpace = -1;
}
else
{
lastSpace = spaceIndex;
}
}
}
contentStream.beginText();
contentStream.setFont(pdfFont, fontSize);
contentStream.moveTextPositionByAmount(startX, startY);
for (String line: lines)
{
contentStream.drawString(line);
contentStream.moveTextPositionByAmount(0, -leading);
}
contentStream.endText();
contentStream.close();
doc.save("E:\\document.pdf");
}catch (Exception exp){
logger.error("[GetInformation] email id is " +exp);
}
finally
{
if (doc != null)
{
try{
doc.close();
}catch (Exception expe){
logger.error("[GetInformation] email id is " +expe);
}
}
}
Upvotes: 0
Reputation: 1685
The problem is with this line
PDDocument pdfDocument = PDDocument.load(new File("text.pdf"));
Specify the path for text.pdf
there, ie along with the path.
Without knowing where the file resides how is the JVM supposed to create the file object, that is why the Exception occurs. Give the path over there, then you are good to go.
Update
It seems as a bug and has been fixed in later versions.
Upvotes: 0