Matt
Matt

Reputation: 616

Importing PDF to String in java

i need to extract text from a pdf file using java. I found iText but it doesn't work the way i wanted it to. Here's my code

package com.itextpdf.mavenproject1;


import com.itextpdf.forms.PdfAcroForm;
import com.itextpdf.forms.fields.PdfButtonFormField;
import com.itextpdf.forms.fields.PdfFormField;
import com.itextpdf.io.font.FontConstants;
import com.itextpdf.kernel.font.PdfFontFactory;
import com.itextpdf.kernel.geom.Rectangle;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import com.itextpdf.kernel.pdf.PdfString;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.kernel.pdf.action.PdfAction;
import com.itextpdf.kernel.pdf.annot.PdfAnnotation;
import com.itextpdf.kernel.pdf.annot.PdfTextAnnotation;
import com.itextpdf.kernel.pdf.canvas.PdfCanvas;
import com.itextpdf.kernel.pdf.canvas.parser.PdfTextExtractor;
import com.itextpdf.test.annotations.WrapToTest;
import java.io.File;
import java.io.IOException;

public class zczytywanie {

    public static void main(String args[]) throws IOException {


       PdfDocument pdfDoc = new PdfDocument(new PdfReader("D:/pdf/pdf"));

       String page= PdfTextExtractor.getTextFromPage(pdfDoc, 1);

       System.out.println(page);

    }
}

And it tells me that there is an error in the line where i try to use PDdfTextExtractor (PdfDocument can not be converted to pdfPage, although i found that pdfDoc has to be PdfReader)

It doesn't work with

PdfReader pdfDoc = new PdfReader("D:/pdf/pdf");

either.

Upvotes: 0

Views: 1615

Answers (1)

Panda1667075
Panda1667075

Reputation: 147

You can try PDFBox or Tikka. But here I am giving an example for PDFBox

Add the PDFBox jar dependency to your pom.xml.

<dependencies>
        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
            <version>2.0.23</version>
        </dependency>
</dependencies>

Java class

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;

import java.io.File;
import java.io.IOException;

    public class TestPDF {
        public static void main(String[] args) {
           try (PDDocument document = PDDocument.load(new File("/path_to_your_pdf_file"))) {
               document.getClass();
    
               if(!document.isEncrypted()){
                   PDFTextStripperByArea stripper = new PDFTextStripperByArea();
                   stripper.setSortByPosition(true);
    
                   PDFTextStripper tStripper = new PDFTextStripper();
                   String pdfFileInText = tStripper.getText(document);
                   System.out.println("Text:" + pdfFileInText);
                   
               }
           } catch (IOException e) {
               e.printStackTrace();
           }
        }
    }

Upvotes: 0

Related Questions