Reputation: 89

Best Way to Read PDF generated with Selenium WebDriver

good afternoon,

I'm stuck in the last step of a test where after inserting a series of information the site generates a pdf payment guide:

I need to capture the information that is in green

Here the code showing when inspecting the source code:

<embed id="plugin" type="application/x-google-chrome-pdf" 

src="https://secweb.procergs.com.br/sng/javax.faces.resource/dynamiccontent.properties.xhtml?ln=primefaces&amp;v=5.3.17&amp;pfdrid=a9fc559a-bea3-4bc2-8234-5543c59715cc&amp;pfdrt=sc&amp;pfdrid_c=false&amp;uid=e483b7ac-35d3-429e-9c84-c5db516f1b8c" stream-url="blob:chrome-extension://mhjfbmdgcfjbbpaeojofohoefgiehjai/3173c884-d121-48c6-b417-5972f907fe9e" headers="Cache-Control: no-cache, no-store, must-revalidate
Connection: Keep-Alive
Content-Encoding: gzip
Content-Language: pt-br
Content-Type: application/pdf; charset=UTF-8
Date: Mon, 03 Sep 2018 20:26:44 GMT
Expires: Mon, 8 Aug 1980 10:00:00 GMT
Keep-Alive: timeout=16, max=1021
Pragma: no-cache
Server: Apache
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-UA-Compatible: IE=Edge
" background-color="0xFF525659" top-toolbar-height="56" top-level-url="undefined">

By my logic I could not even go from the first step that would be to identify the existence of the PDF on the screen through some text that is unique to it:

if (driver0.getPageSource().contains("SECRETARIA DE MODERNIZAÇÃO ADMINISTRATIVA E DOS RECURSOS HUMANOS")) {
System.out.println("Located, we will capture the information ...");
} else {
System.out.println("Not found...");
}

Updating the topic was not successful with the PDFUtil Library, I added it to the library but it does not work

Here my primary test:

try {
            PDFUtil pdfUtil = new PDFUtil();                
            pdfUtil.getText("C://64914273.pdf");
        } catch (Exception ex) {
            System.out.println(ex);
        }

The Console simply does not return anything

Thanks to those who can help me

Upvotes: 0

Answers (2)

JackForman

Reputation: 61

I'd assume this would only be possible once the information is in the PDF by using an OCR library but these are often very brittle.

What I would do is work out the scope of your test, and if you can separate your tests out.

One (automated) test, check that the information being sent when you click submit or whatever is being sent in the HTTP request by the browser is correct. Should be a simple proxy like BrowserMob to intercept the request.

Second (manual) test, check that the PDF producer displays the information correctly when it receives it.

Therefore your automated test would finish once the information is sent and checked, and the manual test would only be run if there was any risk introduced to the PDF producer

Upvotes: 0

vins

Reputation: 15400

One option is to save the pdf and read the content using PDF libraries and parse the text you are looking for.

Take a look at the PDFUtil and examples

http://www.testautomationguru.com/introducing-pdfutil-to-compare-pdf-files-extract-resources/

Upvotes: 1

Best Way to Read PDF generated with Selenium WebDriver

Answers (2)

Related Questions