Yamcha_Kippur
Yamcha_Kippur

Reputation: 117

Puppeteer getting response from pdf download link

I'm automating regression testing for a website and one of the tasks is to verify pdf downloads. I'm using Puppeteer and Chromium for this. I've found that it's rather difficult to download files in headless mode. Instead of downloading the file, I thought it might be prudent to look for a response from the page and the size of the file. My issue: when I try to navigate to the page, nothing seems to happen. I receive a timeout error. Here is the code I'm attempting to use:

const filename = new RegExp('\S*(\.pdf)');
await page.waitForSelector('#download-pdf', {timeout: timeout});
console.log('Clicking on "Download PDF" button');
const link = await page.$eval('#download-pdf', el => el.href);
await Promise.all([
    page.goto(link),
    page.on('response', response => {
        if(response._headers['content-disposition'] === `attachment;filename=${filename}`){
            console.log('Size: ', response._headers['content-length']);
        }
    })
]);

EDIT

If anyone understands how page.goto() ignores .pdf pages, that will be very useful to me.

Let me define the problem better. Upon clicking the download pdf button on the webpage, an event is triggered that generates the pdf file and sends the user along a unique url. This url is destroyed after a short period. In order to get to this point, I believe that I must use page.click() to trigger the event and generate the url. However, page.click() is also attempting to navigate to the pdf url, which is rejected in headless mode. What I need to do is get the url and test for a response from it.

Upvotes: 4

Views: 3714

Answers (1)

Yamcha_Kippur
Yamcha_Kippur

Reputation: 117

I figured out a solution. I'll post it here for anyone else who encounters a similar problem in the days ahead. The idea here is to create an event listener to listen for any and all responses. Since I only cared about responses from pages ending with .pdf I only act on those responses.

page.on('response', intercept=>{
    if(intercept.url().endsWith('.pdf')){
        console.log(intercept.url());
        console.log('HTTP status code: %d', intercept.status());
        console.log(intercept.headers());
    }
});

Upvotes: 5

Related Questions