Reputation: 2434
I'm trying to get individual tasks to throw a time-out during stress testing to see what my calling program will do. However, my cluster keeps tasks fresh indefinitely. It appears to queue all my cluster.execute
calls which then are kept in memory and return their results to listeners that have long since disconnected.
The docs state:
timeout <number> Specify a timeout for all tasks. Defaults to 30000 (30 seconds).
My cluster launch configuration:
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 1,
timeout: 1000 //milliseconds
});
I'm calling the queuing mechanism using:
const pdf = await cluster.execute(html, makePdf);
Where makePdf is an async
function that expects a HTML string, fills a page with it and prints a PDF using the default puppeteer
.
const makePdf = async ({ page, data: html, worker }) => {
await page.setContent(html);
let pdf = await page.pdf({});
console.log('worker ' + worker.id + ' task ' + count);
return pdf;
};
I sort of expected the queue to start emptying itself until it found a task that didn't exceed its timeout value. I've tried setting the timeout
to 1 ms but this doesn't trigger a timeout either. I've tried moving this code to a cluster.task
as described in the examples to see if that would trigger the setting, but no such luck. How do I get already queued requests to time out? Does this even work if I'm not scraping websites or connecting to anything?
I'm considering to pass a timestamp along with my tasks so it can skip doing anything for requests that have expired on the calling side, but I'd rather use built-in options wherever possible.
EDIT:
Thanks to Thomas's clarification I've decided to build this little optimization to prevent tasks where the listeners are long gone from executing.
Swap the content of data
from just html with a json that has both the url and timestamp:
let timestamp = new Date();
await cluster.execute({html, timestamp});
Ignore any queued task where the listener has timed out:
const makePdf = async ({ page, data: { html, timestamp }, worker }) => {
let time_since_call = (new Date() - timestamp);
if (time_since_call < timeout_ms) {
await page.setContent(html);
let pdf = await page.pdf({});
return pdf;
}
};
Upvotes: 2
Views: 4487
Reputation: 25280
This is a misunderstanding what timeout
does. The timeout
option is the timeout for the task, meaning that the job itself (after leaving the queue) cannot take longer than the specified timeout. The option does not cancel a queued job that is still in the queue.
Example:
const cluster = await Cluster.launch({
// ...
maxConcurrency: 1,
timeout: 1000 // one second
});
// ...
for (let i = 0; i < 10; i += 1) {
cluster.queue('...');
}
This code adds 10 jobs and runs them sequentially (as maxConcurrency
is 1
). There is no different between queue
and execute
here (see this question for more information on this topic). So what happens is the following:
The use case you are describing is currently not supported by the library (btw, disclaimer: I'm the author), but as you proposed, you could add a timestamp to the object you are queuing and cancel the job right away if it is too far in the past.
Upvotes: 2