Erel Segal-Halevi
Erel Segal-Halevi

Reputation: 36813

An efficient Javascript set structure

After reading many similar questions:

I still have a question: suppose I have a large array of strings (several thousands), and I have to make many lookups (i.e. check many times whether a given string is contained in this array). What is the most efficient way to do this in Node.js ?

A. Sort the array of strings, then use binary search? or:

B. Convert the strings to keys of an object, then use the "in" operator

?

I know that the complexity of A is O(log N), where N is the number of strings.

But I don't know the complexity of B.

IF a Javascript object is implemented as a hash table, then the complexity of B is, on average, O(1), which is better than A. However, I don't know if a Javascript object is really implemented as a hash table!

Upvotes: 8

Views: 4574

Answers (2)

jfriend00
jfriend00

Reputation: 707786

Update for 2016

Since you're asking about node.js and it is 2016, you can now use either the Set or Map object from ES6 as these are built into ES6. Both allows you to use any string as a key. The Set object is appropriate when you just want to see if the key exists as in:

if (mySet.has(someString)) {
    //code here
}

And, Map is appropriate when you want to store a value for that key as in:

if (myMap.has(someString)) {
    let val = myMap[someString];
    // do something with val here
}

Both ES6 features are now built into node.js as of node V4 (the current version of node.js as of this edit is v6).

See this performance comparison to see how much faster the Set operations are than many other choices.

Older Answer

All important performance questions should be tested with actual performance tests in a tool like jsperf.com. In your case, a javascript object uses a hash-table like implementation because without something that performs pretty well, the whole implementation would be slow since so much of javascript uses object.

String keys on an object would be the first thing I'd test and would be my guess for the best performer. Since the internals of an object are implemented in native code, I'd expect this to be faster than your own hashtable or binary search implemented in javascript.

But, as I started my answer with, you should really test your specific circumstance with the number and length of strings you are most concerned about in a tool like jsperf.

Upvotes: 6

Andrey Sidorov
Andrey Sidorov

Reputation: 25466

For fixed large array of string I suggest to use some form of radix search Also, take a look at different data structures and algorithms (AVL trees, queues/heaps etc) in this package

I'm pretty sure that using JS object as storage for strings will result in 'hash mode' for that object. Depending on implementation this could be O(log n) to O(1) time. Look at some jsperf benchmarks to compare property lookup vs binary search on sorted array.

In practice, especially if I'm not going to use the code in browser I would offload this functionality to something like redis or memcached.

Upvotes: 2

Related Questions