moh17
moh17

Reputation: 243

Assert text in PDF using PDFBox - Selenium/java

I use the below method where I need to find whether a text is present in the PDF file that I have downloaded.

public void iShouldVerify() throws Throwable {
        export_inspections.verifyPDFContent("zzz");



public boolean verifyPDFContent(String reqTextInPDF) {

boolean flag = false;

PDFTextStripper pdfStripper = null;
PDDocument pdDoc = null;
COSDocument cosDoc = null;
String parsedText = null;

try {
    File file = new File("/Users/mohand/Downloads/1956_ANewChecklistTemplate1Updated_BigTurnip_270618.pdf");
    PDFParser parser = new PDFParser(new FileInputStream(file));

    parser.parse();
    cosDoc = parser.getDocument();
    pdfStripper = new PDFTextStripper();
    pdfStripper.setStartPage(1);
    pdfStripper.setEndPage(1);

    pdDoc = new PDDocument(cosDoc);
    parsedText = pdfStripper.getText(pdDoc);
} catch (MalformedURLException e2) {
    System.err.println("URL string could not be parsed " + e2.getMessage());
} catch (IOException e) {
    System.err.println("Unable to open PDF Parser. " + e.getMessage());
    try {
        if (cosDoc != null)
            cosDoc.close();
        if (pdDoc != null)
            pdDoc.close();
    } catch (Exception e1) {
        e.printStackTrace();
    }
}

System.out.println("+++++++++++++++++");
System.out.println(parsedText);
System.out.println("+++++++++++++++++");
System.out.println(reqTextInPDF);


if (parsedText.contains(reqTextInPDF)) {
    flag = true;
}

return flag;
}

The problem is the code passes even if there is no text called as "zzz" in the PDF, the code runs.

How do I assert this? Or is there a better way to deal with this?

Upvotes: 2

Views: 3507

Answers (1)

draxil
draxil

Reputation: 64

Try this simplified version:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.*;
import org.apache.pdfbox.text.PDFTextStripper;
import java.io.IOException;
import java.util.HashSet;
import java.net.*;
import java.io.*;
import java.io.FileInputStream;



public class X {
    public static boolean verifyPDFContent(String reqTextInPDF) throws IOException{

        PDDocument doc = PDDocument.load(new File("test.pdf"));
        PDFTextStripper pdfStripper = new PDFTextStripper();
        String text = pdfStripper.getText(doc);
        doc.close();
        System.out.println(text);
        return text.contains(reqTextInPDF);
    }

    public static void main( String [] args) throws IOException{
        System.out.println(verifyPDFContent("Charity"));
    }
}

This works for me, I wan't 100% able to tell which PDFbox you are using, so if this doesn't compile we may be on different versions (I'm on 2.0.3).

Upvotes: 1

Related Questions