outlaw
outlaw

Reputation: 1193

Data structure for set operations

What data structure supports the following set operations efficiently both in time and space?

  1. union
  2. difference
  3. ismemberof
  4. add
  5. delete

I can think of 3 different ways to do these operations, suppose that we have two sets, their sizes are both N:

Bit Array:

1. O(N)  2.O(N)  3.O(1)  4.O(1)  5.O(1)

HashTable:

1. O(N)  2.O(N)  3.O(1)  4.O(1)  5.O(1)

Ordered Tree:

1. O(NlogN)  2.O(NlogN)  3.O(logN)  4.O(logN)  5.O(logN)

Bit Array and HashTable are fast but they use too much memory, Ordered Tree is slow but consumes less memory.

Please note: the set may contain other types except integer, such as float number or string

What other data structures are both fast and general, and space efficient?

Upvotes: 1

Views: 2213

Answers (3)

Arvind
Arvind

Reputation: 478

You could also look into Union-Find Data Structure

Upvotes: 0

Grigor Gevorgyan
Grigor Gevorgyan

Reputation: 6853

I'd suggest Binary Heap (the easiest one), Binomial Heap (speeding up union) and Fibonacci Heap (hardest to implement, but having the best known amortized times for standard operations).

Operation                 Binary Heap                 Binomial Heap                 Fibonacci heap (amortized)
1. union                          O(n)                            O(logn)                                  O(1)
2. difference               O(nlogn)                         O(nlogn)                                O(nlogn)
3. find(ismemberof)      O(n)                              O(n)                                     O(n)
4. add                           O(lgn)                           O(lgn)                                    O(1)
5. delete                       O(lgn)                           O(lgn)                                    O(lgn)

However, these structures are mostly used when insert, find/extract min(max), union and delete operations are needed. find and diff operations still have poor running time.

Upvotes: 1

Michael Anderson
Michael Anderson

Reputation: 73470

One option is to augment your ordered tree with a bloom filter to speed up the ismemberof type tests.

I think that the overall behaviour would be something like:

1. O(N log(N) )  2. O( ? )  3.O(1)  4.O(log(N))  5.O( log(N) )

However the exact details will depend on the size of the filter, the size of your sets and the size of your domain.

Another option may be Judy Arrays. I've heard good things about them for this kind of use, but have no personal experience.

Yet another option is a forrest approach (rather than a pure binary tree).

Upvotes: 2

Related Questions