Bugasur
Bugasur

Reputation: 101

How to read the pdf file using selenium

I am working on web page over which there is a link, clicking on which it opens a pdf file on new window. I have to read that pdf file to validate some data against the transactions done. One way is to download that file and then use it. Can any one help me out on this. I have to work on IE 11

Thanks in Advance.

Upvotes: 1

Views: 35629

Answers (2)

Ankit Gupta
Ankit Gupta

Reputation: 786

First Downlaod pdfbox jar.

strURL is a web URl which contains .pdf file: like(https://example.com/downloads/presence/Online-Presence-CA-05-02-2017-04-13.pdf)

public boolean verifyPDFContent(String strURL, String text) {

        String output ="";
        boolean flag = false;
        try{
            URL url = new URL(strURL);
            BufferedInputStream file = new BufferedInputStream(url.openStream());
            PDDocument document = null;
            try {
                document = PDDocument.load(file);
                output = new PDFTextStripper().getText(document);
                System.out.println(output);
            } finally {
                if (document != null) {
                    document.close();
                }
            }
        }catch(Exception e){
            e.printStackTrace();
        }
        if(output.contains(text)){
            flag =  true;
        }
        return flag;
    }

Upvotes: 1

Kenil Fadia
Kenil Fadia

Reputation: 234

Use PDFBox and FontBox.

    public String readPDFInURL() throws EmptyFileException, IOException {
        WebDriver driver = new FirefoxDriver();
        // page with example pdf document
        driver.get("file:///C:/Users/admin/Downloads/dotnet_TheRaceforEmpires.pdf");
        URL url = new URL(driver.getCurrentUrl());
        InputStream is = url.openStream();
        BufferedInputStream fileToParse = new BufferedInputStream(is);
        PDDocument document = null;
        try {
            document = PDDocument.load(fileToParse);
            String output = new PDFTextStripper().getText(document);
        } finally {
            if (document != null) {
                document.close();
            }
            fileToParse.close();
            is.close();
        }
        return output;
    }

Since some of the functions from the older versions of PDFBox have been deprecated, we need to use another FontBox along with PDFBox. I have used PDFBox (2.0.3) and FontBox (2.0.3) and it is working fine. It won't read images though.

Upvotes: 4

Related Questions