java PdfTextExtractor.getTextFromPage(Unknown Source)

Question

Hello i have trouble with parsing a pdf when the iterator reaches page 11 an exception is throw.

Any ideas? Thanks

Here's my code:

import java.io.*;
import java.nio.charset.Charset;
import java.util.regex.*;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.hyphenation.TernaryTree.Iterator;
import com.lowagie.text.pdf.parser.PdfTextExtractor;

public class PdfParser {
    /** 
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        int index = 0;
        try {
            PdfReader readerN = new PdfReader("C:\Documents and Settings\stefan.stere\hibernateWorkspace\PdfParser\src\monitor3.pdf");
            OutputStreamWriter out = new OutputStreamWriter( new FileOutputStream(new File("C:\Documents and Settings\stefan.stere\hibernateWorkspace\PdfParser\src\pdf2txt.rtf")),"Cp1252");

            PdfTextExtractor parse = new PdfTextExtractor(readerN);
            int nrPages = readerN.getNumberOfPages();

            for (int i=1; i



and the exception stack:

java.lang.ArrayIndexOutOfBoundsException: Invalid index: 62
at com.lowagie.text.pdf.CMapAwareDocumentFont.decodeSingleCID(Unknown Source)
at com.lowagie.text.pdf.CMapAwareDocumentFont.decode(Unknown Source)
at com.lowagie.text.pdf.parser.PdfContentStreamProcessor.decode(Unknown Source)
at com.lowagie.text.pdf.parser.PdfContentStreamProcessor.displayPdfString(Unknown Source)
at com.lowagie.text.pdf.parser.PdfContentStreamProcessor$ShowTextArray.invoke(Unknown Source)
at com.lowagie.text.pdf.parser.PdfContentStreamProcessor.invokeOperator(Unknown Source)
at com.lowagie.text.pdf.parser.PdfContentStreamProcessor.processContent(Unknown Source)
at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(Unknown Source)
at PdfParser.main(PdfParser.java:32)

pgras · Accepted Answer

No answer but it seems a lot of people have the same problem, there's another related question on SO. And if you search with ArrayIndexOutOfBoundsException and getTextFromPage on google you see the same problem but no solution...

BTW, your loop will stop before processing the last page as the first page has index 1...

java PdfTextExtractor.getTextFromPage(Unknown Source)

Answers (2)

Related Questions