Reputation: 99
I know that node relies on a single event thread. So there is no way for it to have parallel threads. But async.parallel does provide a parallel-like functionality. Another question on Stack implies that async.parallel is using process.nextTick. So essentially async.parallel is stil just a concurrent function rather than a true parallel function?
Upvotes: 1
Views: 2361
Reputation: 3055
I know this isn't really answering the question but I wrote a post on using async.parallel with worker_threads https://tech.beyondtracks.com/posts/node-worker-threads-with-async-parallel/ which may help, the gist is while you may traditionally be using async.parallel like
const os = require('os')
const parallelLimit = require('async/parallelLimit')
// our CPU intensive operation
function fibonacci(n) {
if (n < 2) {
return 1
} else {
return fibonacci(n - 2) + fibonacci(n - 1)
}
}
// number of jobs to run will be the number of CPU cores
const limit = os.cpus().length
const fibonacciSize = 40
// build a set of tasks for parallelLimit to run
const tasks = Array.from({ length: limit }, (v, k) => k + 1).map((task) => {
return cb => {
const result = fibonacci(fibonacciSize)
cb(null, result)
}
})
// run tasks with parallelLimit
parallelLimit(tasks, limit, (err, results) => {
console.log('Finished with', err, results)
})
If you want to run those tasks in parallel across multiple threads using worker_threads then you'd do:
index.js:
const { Worker } = require('worker_threads')
const path = require('path')
const os = require('os')
const parallelLimit = require('async/parallelLimit')
// number of jobs to run will be the number of CPU cores
const limit = os.cpus().length
const workerScript = path.join(__dirname, './worker.js')
// build a set of tasks for parallelLimit to run
const tasks = Array.from({ length: limit }, (v, k) => k + 1).map((task) => {
return cb => {
const worker = new Worker(workerScript, { workerData: task })
worker.on('message', (result) => { cb(null, result) })
worker.on('error', (err) => { cb(err, null) })
}
})
// run tasks with parallelLimit
parallelLimit(tasks, limit, (err, results) => {
console.log('Finished with', err, results)
})
worker.js:
const { parentPort, workerData, isMainThread } = require('worker_threads')
// our CPU intensive operation
function fibonacci(n) {
if (n < 2) {
return 1
} else {
return fibonacci(n - 2) + fibonacci(n - 1)
}
}
const fibonacciSize = 40
if (!isMainThread) {
const result = fibonacci(fibonacciSize)
parentPort.postMessage(result)
}
Upvotes: 1
Reputation: 707248
async.parallel
just lets you start multiple asynchronous operations and then keeps track of when they are all done. How much truly in parallel they run depends entirely upon what the asynchronous operations are. If they are networking operations, then they will likely be entirely parallel because there is very little CPU involved and what CPU there is to handle the getting data in and out of the node.js process will be handled outside of the node.js single thread.
Your own code to process the results of the request does not truly run in parallel. As you seem to know, node.js is single threaded so you never have more than one piece of your own Javascript actually executing at once (except for Worker Threads). But lots of operations such as networking operations and disk operations are handled by node.js outside of the single Javascript thread so that work can be done in parallel.
If you were, for example to pass a series of synchronous functions to async.parallel
such as Javascript math computations, then nothing would be done in parallel. It would run one to completion, then the next and so on because there's only ever one piece of Javascript running at once in the main JS thread (Worker Threads are a different topic).
async.parallel
would be used in the same situations that Promise.all()
would be used in a promise-based design. It's to track the completion (or first error) with multiple asynchronous operations.
True parallelism of synchronous operations in node.js is typically done with either clustering (multiple identical node.js processes sharing the load) or custom child processes (starting up custom child workers to run some specific operation) or Worker Threads and then the OS can apply multiple CPUs to the different processes and achieve some actual parallelism.
I know that node relies on a single event thread. So there is no way for it to have parallel threads?
Within one single node.js process and running only Javascript code (no native code) and no Worker Threads, this is correct. Threads or other processes may be used internally by node.js when calling a function from Javascript that is implemented in native code. But, as of node.js version 10, you can start up Worker Threads. A Worker Thread is an entirely separate Javascript run-time and event loop that can run truly in parallel with the main thread. The two cannot directly share regular variables and functions, instead they communicate by sending messages to one another and those messages run through each thread's event loop. This controls synchronization and prevents problems with two threads trying to access the same variables at the same time and causing concurrency issues. There is work proceeding to create special types of data that can be accessed by more than one thread using some thread synchronization tools. The main use for the current design of Worker Threads would be when you have some CPU-intensive code that doesn't need high bandwidth access to the state of your application. So, for example, you can package up a job and ship it off to a Worker Thread. Imagine that you want to do some image analysis on a piece video (entirely in Javascript) or you want to do some serious math computations.
Another question on Stack implies that async.parallel is using process.nextTick.
It is common for asynchronous management code (which is what the async library is) to use process.nextTick()
to force a callback to always be called asynchronously to make predictable behavior if it is using a mix of synchronous and asynchronous operations. For this same reasons, the Promise specification requires .then()
to always be called upon some future tick, never synchronously, even if the promise is resolved immediately.
So essentially async.parallel is stil just a concurrent function rather than a true parallel function?
The point of async.parallel
is to track the completion of multiple asynchronous operations. It does not make anything asynchronous by itself. The operation itself that you pass to async.parallel
will be whatever it is by itself (synchronous or asynchronous). The async library does not change that.
async.parallel
does not interact with or use Worker Threads on its own. It's for managing multiple asynchronous operations in one thread that can run in parallel because of their own native code implementations and their asynchronous interface. It does not run the functions you pass in separate Javascript threads.
Upvotes: 7