Efficient data structure for set of very small integers

Question

I want an efficient data structure that represents a set of integers from the range 0..k, where k is very small, let's say less than 1024.

I have come up with an implementation with the following complexities:

initialize: O(1)
size: O(1)
contains: O(1)
insert: O(1)
remove: O(1)
clear: O(1)
union: O(k)

However, I suspect my O(1)s might really by O(log(k)) since we have to read O(log(k)) digits somewhere.

The memory requirement is O(k).

I have not implemented equals, intersection or difference because I don't need them, but they all would be O(k). My implementation does support iteration.

This seems like it could come up in many different problems, so I'm wondering has my implementation been done before?

And more importantly: Can we do better?

Obviously better is a little subjective, but something with only O(1) and O(log(k)) everywhere might be interesting.

I suspect bitsets could be more practical when k is a little bigger, but I'm not sure about for such small k.

Here is pseudocode of what I have. It works by allocating an array and a map (implemented as another array) of lengths k and k+1, but only caring about the first m elements of the array, where m is the current size of the set.

class IntSet:
  m: int
  arr: array[int] 
  map: array[int]

  init(k):
    arr: int[k] # allocate, but don't initialize
    map: int[k+1] # allocate, but don't initialize
    m = 0

  size(): m

  contains(x: int): map[x] < m and arr[map[x]] == x

  insert(x: int):
    if not contains(x):
      arr[m] = x
      map[x] = m
      m += 1

  remove(x: int):
    if contains(x):
      m -= 1
      arr[map[x]] = arr[m] # move x off the end
      map[arr[m]] = map[x]

  clear(): m = 0

  union(s: IntSet):
    for i in 0..s.m:
      if not contains(s.arr[i]):
        insert(s.arr[i])

  items():
    for i in 0..m:
      yield arr[i]

EDIT:

Based on my understanding, I imagine a bitset implementation would look like:

    initialize: O(k)
    size: O(popcount of k bits) O(1)
    contains: O(1)
    insert: O(1)
    remove: O(1)
    clear: O(k)
    union: O(bitwise OR of k bits)

If we use a bitset with 1024 bits I'm not sure if this is better.

Pham Trung · Accepted Answer

This will look a little bit hacky, but you can store the set into an array of 64 bit integer int64[16] or similarly, 32 bit integer int32[32], with the first 4 bit determines which index the element belong to, and the last 6 bit determine which bit will be set in the integer. All operations will be O(1) except clear and union will be O(log k / 64), with k is the maximum element in the set

So first, we have int64[16]set

To add an element x

set[x >> 6] |= 1 << (0x3F & x) // Only consider first 6 bit

To remove an element

set[x >> 6] ^= 1 << (0x3F & x)

To clear:

for int i = 0; i < 16; i++{
    set[i] = 0;
}

To union two set a and b

for int i = 0; i < 16; i++{
    set[i] = a[i] | b[i];
}

As requested, check if set contains x

(set[x >> 6] & (1 << (0x3F & x))) != 0

To keep track the number of element in the set, an obvious solution is to go through every bit in each element in the array, which would be an O(k) time complexity.

There exists few solutions to count the number of bit in O(1), like this one How to count the number of set bits in a 32-bit integer?

In addition, if we change to use int8[128] set instead of current solution, which mean we only using the first 3 bit of the number to determine which bit to be set, we can have a hardcoded array to keep track of how many bit it is, so, for example:

int numberOfElement = 0;
int[1<<8]bitCount // Pre-populated value, so bitCount[i] will give an answer of how many bit is set in i

for int i = 0; i < 128; i++ {
    set[i] = a[i] | b[i];
    numberOfElement += bitCount[set[i]];
}

Efficient data structure for set of very small integers

Answers (2)

Related Questions