Jacky Lee
Jacky Lee

Reputation: 1233

In node.js, how to declare a shared variable that can be initialized by master process and accessed by worker processes?

I want the following

Here is my code, which obviously does not achieve my goal.

var my_shared_var;
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  // Load a large table from file and save it into my_shared_var,
  // hoping the worker processes can access to this shared variable,
  // so that the worker processes do not need to reload the table from file.
  // The loading typically takes 15 seconds.
  my_shared_var = load('path_to_my_large_table');

  // Fork worker processes
  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
} else {
  // The following line of code actually outputs "undefined".
  // It seems each process has its own copy of my_shared_var.
  console.log(my_shared_var);

  // Then perform query against my_shared_var.
  // The query should be performed by worker processes,
  // otherwise the master process will become bottleneck
  var result = query(my_shared_var);
}

I have tried saving the large table into MongoDB so that each process can easily access to the data. But the table size is so huge that it takes MongoDB about 10 seconds to complete my query even with an index. This is too slow and not acceptable for my real-time application. I have also tried Redis, which holds data in memory. But Redis is a key-value store and my data is a table. I also wrote a C++ program to load the data into memory, and the query took less than 1 second, so I want to emulate this in node.js.

Upvotes: 29

Views: 21621

Answers (7)

Mauvis Ledford
Mauvis Ledford

Reputation: 42354

This question was posted in 2012, exactly 10 years ago. Since no other answer has mentioned it, Node.js now supports Worker Threads that support shared memory.

Directly from the docs:

Workers (threads) are useful for performing CPU-intensive JavaScript operations. Unlike child_process or cluster, worker_threads can share memory. They do so by transferring ArrayBuffer instances or sharing SharedArrayBuffer instances.

Upvotes: 1

kyr0
kyr0

Reputation: 535

This way works to "share a variable"; it is a bit more fancy than the way @Shivam did present. However, the module internally uses the same API. Therefore "shared memory" is a bit misleading as in cluster each process is a fork of the parent process. At fork time, process memory is duplicated in OS memory. Therefore there is no real shared memory except low-level shared memory like shm device or virtual shared memory page (Windows). I did implement a native module for Node.js which does make use of native shared memory (which is real shared memory) as using this technique both process read directly from a OS shared memory section. However, this solution doesn't really apply here well because it is limited to scalar values. You could of course JSON.stringify and share the JSON serialized data string, but the time it consumes to parse/stringify is totally non-ideal for most use cases. (Especially for larger objects parsing/stringifying of JSON with standard library implementations becomes non-linear).

Thus, this solutions seems the most promising for now:

const cluster = require('cluster');
require('cluster-shared-memory');

if (cluster.isMaster) {
  for (let i = 0; i < 2; i++) {
    cluster.fork();
  }
} else {
  const sharedMemoryController = require('cluster-shared-memory');
  // Note: it must be a serializable object
  const obj = {
    name: 'Tom',
    age: 10,
  };
  // Set an object
  await sharedMemoryController.set('myObj', obj);
  // Get an object
  const myObj = await sharedMemoryController.get('myObj');
  // Mutually exclusive access
  await sharedMemoryController.mutex('myObj', async () => {
    const newObj = await sharedMemoryController.get('myObj');
    newObj.age = newObj.age + 1;
    await sharedMemoryController.set('myObj', newObj);
  });
}

Upvotes: 1

Allen Luce
Allen Luce

Reputation: 8389

If read-only access is fine for your application, try out my own shared memory module. It uses mmap under the covers, so data is loaded as it's accessed and not all at once. The memory is shared among all processes on the machine. Using it is super easy:

const Shared = require('mmap-object')

const shared_object = new Shared.Open('table_file')

console.log(shared_object.property)

It gives you a regular object interface to a key-value store of strings or numbers. It's super fast in my applications.

There is also an experimental read-write version of the module available for testing.

Upvotes: 6

Shiv
Shiv

Reputation: 3275

If I translate your question in a few words, you need to share data of MASTER entity with WORKER entity. It can be done very easily using events:

From Master to worker:

worker.send({json data});    // In Master part

process.on('message', yourCallbackFunc(jsonData));    // In Worker part

From Worker to Master:

process.send({json data});   // In Worker part

worker.on('message', yourCallbackFunc(jsonData));    // In Master part

I hope this way you can send and receive data bidirectionally. Please mark it as answer if you find it useful so that other users can also find the answer. Thanks

Upvotes: 17

Reza Roshan
Reza Roshan

Reputation: 446

You can use Redis.

Redis is an open source, BSD licensed, advanced key-value cache and store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs.

redis.io

Upvotes: 2

Vadim Baryshev
Vadim Baryshev

Reputation: 26199

In node.js fork works not like in C++. It's not copy current state of process, it's run new process. So, in this case variables isn't shared. Every line of code works for every process but master process have cluster.isMaster flag set to true. You need to load your data for every worker processes. Be careful if your data is really huge because every process will have its own copy. I think you need to query parts of data as soon as you need them or wait if you realy need it all in memory.

Upvotes: 6

Martin Blech
Martin Blech

Reputation: 13553

You are looking for shared memory, which node.js just does not support. You should look for alternatives, such as querying a database or using memcached.

Upvotes: 10

Related Questions