Reputation: 25812
Here is an exercise (3-15) in the book "Algorithm Design Manual".
Design a data structure that allows one to search, insert, and delete an integer X in O(1) time (i.e. , constant time, independent of the total number of integers stored). Assume that 1 ≤ X ≤ n and that there are m + n units of space available, where m is the maximum number of integers that can be in the table at any one time. (Hint: use two arrays A[1..n] and B[1..m].) You are not allowed to initialize either A or B, as that would take O(m) or O(n) operations. This means the arrays are full of random garbage to begin with, so you must be very careful.
I am not really seeking for the answer, because I don't even understand what this exercise asks.
From the first sentence:
Design a data structure that allows one to search, insert, and delete an integer X in O(1) time
I can easily design a data structure like that. For example:
Because 1 <= X <= n, so I just have an bit vector of n slots, and let X be the index of the array, when insert, e.g., 5, then a[5] = 1; when delete, e.g., 5, then a[5] = 0; when search, e.g.,5, then I can simply return a[5], right?
I know this exercise is harder than I imagine, but what's the key point of this question?
Upvotes: 17
Views: 7190
Reputation: 3086
I first saw the idea in Cameron's answer in Jon Bentley Programming Pearls.
The idea is pretty simple but it's not straightforward to see why the initial random values that may be on the uninitialized arrays does not matter. This link explains pretty well the insertion and search operations. Deletion is left as an exercise, but is answered by one of the commenters:
remove-member(i):
if not is-member(i): return
j = dense[n-1];
dense[sparse[i]] = j;
sparse[j] = sparse[i];
n = n - 1
Upvotes: 1
Reputation: 91094
You are basically implementing a multiset with bounded size, both in number of elements (#elements <= m
), and valid range for elements (1 <= elementValue <= n
).
myCollection.search(x)
--> return True if x inside, else FalsemyCollection.insert(x)
--> add exactly one x to collectionmyCollection.delete(x)
--> remove exactly one x from collectionConsider what happens if you try to store 5 twice, e.g.
myCollection.insert(5)
myCollection.insert(5)
That is why you cannot use a bit vector. But it says "units" of space, so the elaboration of your method would be to keep a tally of each element. For example you might have [_,_,_,_,1,_,...]
then [_,_,_,_,2,_,...]
.
Why doesn't this work however? It seems to work just fine for example if you insert 5 then delete 5... but what happens if you do .search(5)
on an uninitialized array? You are specifically told you cannot initialize it, so you have no way to tell if the value you'll find in that piece of memory e.g. 24753
actually means "there are 24753 instances of 5
" or if it's garbage.
NOTE: You must allow yourself O(1)
initialization space, or the problem cannot be solved. (Otherwise a .search()
would not be able to distinguish the random garbage in your memory from actual data, because you could always come up with random garbage which looked like actual data.) For example you might consider having a boolean which means "I have begun using my memory" which you initialize to False, and set to True the moment you start writing to your m
words of memory.
If you'd like a full solution, you can hover over the grey block to reveal the one I came up with. It's only a few lines of code, but the proofs are a bit longer:
SPOILER: FULL SOLUTION
Setup:
Use N words as a dispatch table:locationOfCounts[i]
is an array of size N, with values in the rangelocation=[0,M]
. This is the location where the count ofi
would be stored, but we can only trust this value if we can prove it is not garbage. >!
(sidenote: This is equivalent to an array of pointers, but an array of pointers exposes you being able to look up garbage, so you'd have to code that implementation with pointer-range checks.)
To find out how manyi
s there are in the collection, you can look up the valuecounts[loc]
from above. We use M words as the counts themselves:counts
is an array of size N, with two values per element. The first value is the number this represents, and the second value is the count of that number (in the range [1,m]). For example a value of(5,2)
would mean that there are 2 instances of the number5
stored in the collection.
(M words is enough space for all the counts. Proof: We know there can never be more than M elements, therefore the worst-case is we have M counts of value=1. QED)
(We also choose to only keep track of counts >= 1, otherwise we would not have enough memory.)
Use a number callednumberOfCountsStored
that IS initialized to 0 but is updated whenever the number of item types changes. For example, this number would be 0 for{}
, 1 for{5:[1 times]}
, 1 for{5:[2 times]}
, and 2 for{5:[2 times],6:[4 times]}
.
1 2 3 4 5 6 7 8...
locationOfCounts[<N]: [☠, ☠, ☠, ☠, ☠, 0, 1, ☠, ...]
counts[<M]: [(5,⨯2), (6,⨯4), ☠, ☠, ☠, ☠, ☠, ☠, ☠, ☠..., ☠]
numberOfCountsStored: 2
Below we flush out the details of each operation and prove why it's correct:
Algorithm:
There are two main ideas: 1) we can never allow ourselves to read memory without verifying that is not garbage first, or if we do we must be able to prove that it was garbage, 2) we need to be able to prove inO(1)
time that the piece ofcounter
memory has been initialized, with onlyO(1)
space. To go about this, theO(1)
space we use isnumberOfItemsStored
. Each time we do an operation, we will go back to this number to prove that everything was correct (e.g. see ★ below). The representation invariant is that we will always store counts incounts
going from left-to-right, sonumberOfItemsStored
will always be the maximum index of the array that is valid.
.search(e)
-- ChecklocationsOfCounts[e]
. We assume for now that the value is properly initialized and can be trusted. We proceed to checkcounts[loc]
, but first we check ifcounts[loc]
has been initialized: it's initialized if 0<=loc
<numberOfCountsStored
(if not, the data is nonsensical so we return False). After checking that, we look upcounts[loc]
which gives us anumber,count
pair. Ifnumber
!=e
, we got here by following randomized garbage (nonsensical), so we return False (again as above)... but if indeednumber
==e
, this proves that the count is correct (★proof:numberOfCountsStored
is a witness that this particularcounts[loc]
is valid, andcounts[loc].number
is a witness thatlocationOfCounts[number]
is valid, and thus our original lookup was not garbage.), so we would return True.
.insert(e)
-- Perform the steps in.search(e)
. If it already exists, we only need to increment the count by 1. However if it doesn't exist, we must tack on a new entry to the right of thecounts
subarray. First we incrementnumberOfCountsStored
to reflect the fact that this new count is valid:loc = numberOfCountsStored++
. Then we tack on the new entry:counts[loc] = (e,⨯1)
. Finally we add a reference back to it in our dispatch table so we can look it up quicklylocationOfCounts[e] = loc
.
.delete(e)
-- Perform the steps in.search(e)
. If it doesn't exist, throw an error. If the count is >= 2, all we need to do is decrement the count by 1. Otherwise the count is 1, and the trick here to ensure the wholenumberOfCountsStored
-counts[...]
invariant (i.e. everything remains stored on the left part ofcounts
) is to perform swaps. If deletion would get rid of the last element, we will have lost acounts
pair, leaving a hole in our array:[countPair0, countPair1, _hole_, countPair2, countPair{numberOfItemsStored-1}, ☠, ☠, ☠..., ☠]
. We swap this hole with the last countPair, decrementnumberOfCountsStored
to invalidate the hole, and updatelocationOfCounts[the_count_record_we_swapped.number]
so it now points to the new location of the count record.
Upvotes: 17
Reputation: 1725
Here is an idea:
treat the array B[1..m] as a stack, and make a pointer p to point to the top of the stack (let p = 0 to indicate that no elements have been inserted into the data structure). Now, to insert an integer X, use the following procedure:
p++;
A[X] = p;
B[p] = X;
Searching should be pretty easy to see here (let X' be the integer you want to search for, then just check that 1 <= A[X'] <= p, and that B[A[X']] == X'). Deleting is trickier, but still constant time. The idea is to search for the element to confirm that it is there, then move something into its spot in B (a good choice is B[p]). Then update A to reflect the pointer value of the replacement element and pop off the top of the stack (e.g. set B[p] = -1 and decrement p).
Upvotes: 3
Reputation: 54554
It's easier to understand the question once you know the answer: an integer is in the set if A[X]<total_integers_stored && B[A[X]]==X
.
The question is really asking if you can figure out how to create a data structure that is usable with a minimum of initialization.
Upvotes: 1