Reputation: 1566
In a NodeJS v10.x.x environment, when trying to create a PDF page from some HTML code, I'm getting a closed page issue every time I try to do something with it (setCacheEnabled, setRequestInterception, etc...):
async (page, data) => {
try {
const {options, urlOrHtml} = data;
const finalOptions = { ...config.puppeteerOptions, ...options };
// Set caching flag (if provided)
const cache = finalOptions.cache;
if (cache != undefined) {
delete finalOptions.cache;
await page.setCacheEnabled(cache); //THIS LINE IS CAUSING THE PAGE TO BE CLOSED
}
// Setup timeout option (if provided)
let requestOptions = {};
const timeout = finalOptions.timeout;
if (timeout != undefined) {
delete finalOptions.timeout;
requestOptions.timeout = timeout;
}
requestOptions.waitUntil = 'networkidle0';
if (urlOrHtml.match(/^http/i)) {
await page.setRequestInterception(true); //THIS LINE IS CAUSING ERROR DUE TO THE PAGE BEING ALREADY CLOSED
page.once('request', request => {
if(finalOptions.method === "POST" && finalOptions.payload !== undefined) {
request.continue({method: 'POST', postData: JSON.stringify(finalOptions.payload)});
}
});
// Request is for a URL, so request it
await page.goto(urlOrHtml, requestOptions);
}
return await page.pdf(finalOptions);
} catch (err) {
logger.info(err);
}
};
I read somewhere that this issue could be caused due to some await missing, but that doesn't look like my case.
I'm not using directly puppeteer, but this library that creates a cluster on top of it and handles processes:
https://github.com/thomasdondorf/puppeteer-cluster
Upvotes: 2
Views: 5231
Reputation: 25230
You already gave the solution, but as this is a common problem with the library (I'm the author 🙂) I would like to provide some more insights.
When a job is queued and ready to be executed, puppeteer-cluster will create a page and call the task function (given to cluster.task
) with the created page
object and the queued data. The cluster then waits until the Promise is finished (fulfilled or rejected) and will close the page and execute the next job in the queue.
As an async-function is implicitly creating a Promise, this means as soon as the async-function given to the cluster.task
function is finished, the page is closed. There is no magic happening to determine if the page might be used in the future.
Below is a code sample with a common mistake. The user might want to wait for an external event before closing the page as in the (not working) example below:
Non-working (!) code sample:
await cluster.task(async ({ page, data }) => {
await page.goto('...');
setTimeout(() => { // user is waiting for an asynchronous event
await page.evaluate(/* ... */); // Will throw an error as the page is already closed
}, 1000);
});
In this code, the page is already closed before the asynchronous function is executed. To correct way to do this would be to return a Promise instead.
Working code sample:
await cluster.task(async ({ page, data }) => {
await page.goto('...');
// will wait until the Promise resolves
await new Promise(resolve => {
setTimeout(() => { // user is waiting for an asynchronous event
try {
await page.evalute(/* ... */);
resolve();
} catch (err) {
// handle error
}
}, 1000);
});
});
In this code sample, the task function waits until the inner promise is resolved until it resolves the function. This will keep the page open until the asynchronous function calls resolve
. In addition, the code uses a try..catch
block as the library is not able to catch events thrown inside asynchronous code blocks.
Upvotes: 4
Reputation: 1566
I got it.
I was indeed forgetting an await to the call that was made to the function I posted.
That call was in another file that I use fot the cluster instance creation:
async function createCluster() {
//We will protect our app with a Cluster that handles all the processes running in our headless browser
const cluster = await Cluster.launch({
concurrency: Cluster[config.cluster.concurrencyModel],
maxConcurrency: config.cluster.maxConcurrency
});
// Event handler to be called in case of problems
cluster.on('taskerror', (err, data) => {
console.log(`Error on cluster task... ${data}: ${err.message}`);
});
// Incoming task for the cluster to handle
await cluster.task(async ({ page, data }) => {
main.postController(page, data); // <-- I WAS MISSING A return await HERE
});
return cluster;
}
Upvotes: 1