Alex
Alex

Reputation: 68406

Multi-threading for zip in nodejs

Can zip and unzip operation be made-multithreaded in nodejs ?

There are a bunch of modules like yauzl, but neither uses multiple threads, and you can't start multiple threads yourself with node-cluster or something like that, because each zip file must be handled in a single thread

Upvotes: 8

Views: 2883

Answers (5)

Matt Simerson
Matt Simerson

Reputation: 1115

Can zip and unzip operation be made-multithreaded in nodejs?

Yes.

...and you can't start multiple threads yourself ... because each zip file must be handled in a single thread

I suspect your premise is faulty. Why exactly do you think a node process cannot start multiple threads? Here is an app I'm running which is using the very mature node.js cluster module with a parent process acting as a supervisor and two child processes doing heavily network and disk I/O bound tasks.

top output showing node.js processes using CPU threads

As you can see in the C column, each process is running on a separate thread. This lets the master process remain responsive for command and control tasks (like spawning/reaping workers) while the worker processes are CPU or disk bound. This particular server accepts files from the network, sometimes decompresses them, and feeds them through external file processors. IOW, its a task that includes compression like you describe.

I'm not sure you'd want to use worker threads based on this snippet from the docs:

Workers (threads) are useful for performing CPU-intensive JavaScript operations. They will not help much with I/O-intensive work. Node.js’s built-in asynchronous I/O operations are more efficient than Workers can be.

To me, that description screams, "crypo!" In the past I've spawned child processes when having to perform any expensive crypo operations.

In another project I use node's child_process module and kick off a new child process each time I have a batch of files to compress. That particular service sees a list of ~400 files with names like process-me-2019.11.DD.MM and concatenates them into a single process-me-2019-11-DD file. It takes a while to compress so spawning a new process avoids blocking on the main thread.

Upvotes: 2

Akshay
Akshay

Reputation: 639

Help for how to do multi-threading in node js. You will have to create below three file

index.mjs

import run from './Worker.mjs';

/**
* design your input list of zip files here and send them to `run` one file name at a time
* to zip, using a loop or something. It acts as promise.
* exmaple : run( <your_input> ).then( <your_output> );
**/

Worker.mjs

import { Worker } from 'worker_threads';

function runService(id, options) {
    return new Promise((resolve, reject) => {
        const worker = new Worker('./src/WorkerService.mjs', { workerData: { <your_input> } });
        worker.on('message', res => resolve({ res: res, threadId: worker.threadId }));
        worker.on('error', reject);
        worker.on('exit', code => {
            if (code !== 0)
                reject(new Error(`Worker stopped with exit code ${code}`));
        });
    });
}

async function run(id, options) {
    return await runService(id, options);
}

export default run;

WorkerService.mjs

import { workerData } from 'worker_threads';

// Here goes your logic for zipping a file, where as `workerData` will have <your_input>.

Let me know if it helps.

Upvotes: 1

Vivek Anand
Vivek Anand

Reputation: 1374

Node JS uses Libuv and worker thread . Worker thread is a way to do operation in multi-threaded manner. While by using libuv (it maintains thread in thread pool) you can increase thread of default node js server. You can use both to improve node js performance for your operation.

So here is official documentation for worker thread : https://nodejs.org/api/worker_threads.html

See how you can increase thread pool in node js here : print libuv threadpool size in node js 8

Upvotes: 3

Strike Eagle
Strike Eagle

Reputation: 862

According to Zlib documentation

Threadpool Usage: All zlib APIs, except those that are explicitly synchronous, use libuv's threadpool. This can lead to surprising effects in some applications, such as subpar performance (which can be mitigated by adjusting the pool size) and/or unrecoverable and catastrophic memory fragmentation. https://nodejs.org/api/zlib.html#zlib_threadpool_usage

According to libuv's threadpool you can change the environment variable UV_THREADPOOL_SIZE to change the maximum size

If you instead wish to be compressing many small files at the same time you can use Worker Threads https://nodejs.org/api/worker_threads.html

On reading your question again it seems like you want multiple files. Use Worker Threads, these will not block your main thread and you can get the output back from them via promises.

Upvotes: 6

Sudhir Roy
Sudhir Roy

Reputation: 266

There is no way you can do multi-threading in pure Nodejs until you use any third-party library. You can execute the process in parallel using promises. If you don't want to overload the main thread which node uses then you can implement RabitMQ (Redis Queue). It will run in its own thread so your main thread will never be blocked.

Upvotes: 0

Related Questions