Is there any suitable data structure or optimisation for this problem?

I've got a Node.JS server and I have a dictionary(hash/map) in it (key is a Number, value - Array). Each element of a dictionary is an Array of IDs(strings) and has many elements. Each element of an Array is unique within it's array. For example:

let map = {2333:['id1', 'id2', 'id3', 'id4'], 1234:['id3', 'id4', 'id5'], 123213:['id4', 'id5', 'id77']}

There are queries to the server, which add new elements to arrays or to the dictionary. This happens really often. And there is another type of query, with a set of several keys from our dictionary as a parameter. I need to iterate through that set, find all arrays in the dictionary by the keys from the set and count the number of times each ID occurred. Here is my straightforward solution:

let queryArray = [1234, 123213];
let result = {};//Resulting hash of ID's occurrences
for(let i=0; i<queryArray.length; i++){
    let key = queryArray[i];
    if(!key) continue;
    let array = map[key];
    for(let j=0; j<array.length; j++){
        let id = array[j];
        if(!result[id]) result[id] = 0;
        result[id]++;
    }
 }
//result = {'id3':1, 'id4':2, 'id5':2, 'id77':1};

This operation happens really often on the server and I need to optimize it somehow. Do you have any ideas? The programming language of the answer does not matter.

Upvotes: 3

Answers (2)

Doğancan Arabacı

Reputation: 3992

You can create a meta data object just for your counting query.
If you can afford to duplicate the size of data you have, you can apply this method.

const map = {
  2333: ["id1", "id2", "id3", "id4"],
  1234: ["id3", "id4", "id5"],
  123213: ["id4", "id5", "id77"]
};

const counts = {
  2333: { id1: 1, id2: 1, id3: 1, id4: 1 },
  1234: { id3: 1, id4: 1, id5: 1 },
  123213: { id4:1, id5: 1, id77: 1 }
}

// queryArray 1234, 123213
function getQuery(queryArray) {
  let result = {}; //Resulting hash of ID's occurrences
  queryArray.forEach(query => {
    const count = counts[query]
    Object.keys(count).forEach(id => {
      result[id] = (result[id] || 0) + count[id]
    })
  })
  return result
}

console.log(getQuery([1234,123213]))

This approach will get rid of counting id occurrences but increase your memory usage. However I think you need speed more than memory.

One more thing to implement is how to maintain counts object. But this depends on how you add ids to your map. Whenever you add/remove something, you need to update counts object.

Upvotes: 2

rustyx

Reputation: 85452

Just create a second map to serve as the reverse dictionary:

let map = {
    2333: ['id1', 'id2', 'id3', 'id4'],
    1234: ['id3', 'id4', 'id5'],
    123213: ['id4', 'id5', 'id77']
}

let idcounts = {
    'id1': 1,
    'id2': 1,
    'id3': 2,
    'id4': 3,
    'id5': 2,
    'id77': 1
}

Increment idcounts[id] when adding a new id, and decrement when removing.

If you have a lot of overlapping IDs, consider creating a separate map to map ID strings to integer keys, and then work with ints in the dictionaries.

Having said that, Node.JS is really not well-suited for CPU- or memory-intensive work, due to its single-threaded architecture. You might want to consider offloading the lookup work to an external service like Redis, or use a language like Go or C++ with a mutex around the maps to allow for parallel lookup access.

Upvotes: 0

Is there any suitable data structure or optimisation for this problem?

Answers (2)

Related Questions