Reputation: 601
I need a data structure that supports FAST insertion and deletion of (key, value) pairs, as well as "get random key", which does the same thing as random.choice(dict.keys()) for a dictionary. I've searched on the internet, and most people seem to be satisfied with the random.choice(dict.keys()) approach, despite it being linear time.
I'm aware that implementing this faster is possible:
Is there any easy way to get this in Python, though? It seems like there should be!
Upvotes: 21
Views: 9509
Reputation: 9
To get O(1) space you need an array data structure and a dictionary that stores values in the array and their indices.
Then when adding values you simply push them onto your array and dictionary with its index in the array.
Then you have acces randomly since you are using array data structure.
When removing values you look at the index of the value you want to remove in the dictionary. Then replace that value in the array with the last value in the array (make sure it is not the last element already) and pop() the last value in the array. After that you update the key of the replaced value (last value in array) in the dictionary with the deleted value index. Finally you delete the key and value of the value you want to remove since it doesnt make sense it to be on the dictionary.
class RandomizedSet:
def __init__(self):
self.container = []
self.indices = {}
def insert(self, val: int) -> bool:
if val in self.indices:
return False
self.indices[val] = len(self.container)
self.container.append(val)
return True
def remove(self, val: int) -> bool:
if val not in self.indices:
return False
idxOfValueToRemove = self.indices[val]
lastValue = self.container[-1]
if idxOfValueToRemove < len(self.container)-1:
self.container[idxOfValueToRemove] = lastValue
self.indices[lastValue] = idxOfValueToRemove
self.container.pop()
del self.indices[val]
return True
def getRandom(self) -> int:
return random.choice(list(self.container))
Upvotes: 1
Reputation: 17902
This may not specifically relevant to the specific use case listed above, but this is the question I get when searching for a way to nicely get a hold of "any" key in a dictionary.
If you don't need a truly random choice, but just need some arbitrary key, here are two simple options I've found:
key = next(iter(d)) # may be a little expensive, but presumably O(1)
The second is really useful only if you're happy to consume the key+value from the dictionary, and due to the mutation(s) will not be as algorithmically efficient:
key, value = d.popitem() # may not be O(1) especially if next step
if MUST_LEAVE_VALUE:
d[key] = value
Upvotes: 5
Reputation: 91092
[edit: Completely rewritten, but keeping question here with comments intact.]
Below is the realization of a dictionary wrapper with O(1) get/insert/delete, and O(1) picking of a random element.
The main idea is that we want to have an O(1) but arbitrary map from range(len(mapping))
to the keys. This will let us get random.randrange(len(mapping))
, and pass it through the mapping.
This is very difficult to implement until you realize that we can take advantage of the fact that the mapping can be arbitrary. The key idea to achieve a hard bound of O(1) time is this: whenever you delete an element, you swap it with the highest arbitrary-id element, and update any pointers.
class RandomChoiceDict(object):
def __init__(self):
self.mapping = {} # wraps a dictionary
# e.g. {'a':'Alice', 'b':'Bob', 'c':'Carrie'}
# the arbitrary mapping mentioned above
self.idToKey = {} # e.g. {0:'a', 1:'c' 2:'b'},
# or {0:'b', 1:'a' 2:'c'}, etc.
self.keyToId = {} # needed to help delete elements
Get, set, and delete:
def __getitem__(self, key): # O(1)
return self.mapping[key]
def __setitem__(self, key, value): # O(1)
if key in self.mapping:
self.mapping[key] = value
else: # new item
newId = len(self.mapping)
self.mapping[key] = value
# add it to the arbitrary bijection
self.idToKey[newId] = key
self.keyToId[key] = newId
def __delitem__(self, key): # O(1)
del self.mapping[key] # O(1) average case
# see http://wiki.python.org/moin/TimeComplexity
emptyId = self.keyToId[key]
largestId = len(self.mapping) # about to be deleted
largestIdKey = self.idToKey[largestId] # going to store this in empty Id
# swap deleted element with highest-id element in arbitrary map:
self.idToKey[emptyId] = largestIdKey
self.keyToId[largestIdKey] = emptyId
del self.keyToId[key]
del self.idToKey[largestId]
Picking a random (key,element):
def randomItem(self): # O(1)
r = random.randrange(len(self.mapping))
k = self.idToKey[r]
return (k, self.mapping[k])
Upvotes: 5
Reputation: 22123
Here is a somewhat convoluted approach:
dictionary[key] = (index, value)
) and add the key to the index-to-key dictionary (indexdict[index] = key
).random.randrange(0, next_index)
. If the index is not in the key-to-index dictionary, re-try (this should be rare).Here is an implementation:
import random
class RandomDict(object):
def __init__(self): # O(1)
self.dictionary = {}
self.indexdict = {}
self.next_index = 0
self.removed_indices = None
self.len = 0
def __len__(self): # might as well include this
return self.len
def __getitem__(self, key): # O(1)
return self.dictionary[key][1]
def __setitem__(self, key, value): # O(1)
if key in self.dictionary: # O(1)
self.dictionary[key][1] = value # O(1)
return
if self.removed_indices is None:
index = self.next_index
self.next_index += 1
else:
index = self.removed_indices[0]
self.removed_indices = self.removed_indices[1]
self.dictionary[key] = [index, value] # O(1)
self.indexdict[index] = key # O(1)
self.len += 1
def __delitem__(self, key): # O(1)
index = self.dictionary[key][0] # O(1)
del self.dictionary[key] # O(1)
del self.indexdict[index] # O(1)
self.removed_indices = (index, self.removed_indices)
self.len -= 1
def random_key(self): # O(log(next_item/len))
if self.len == 0: # which is usually close to O(1)
raise KeyError
while True:
r = random.randrange(0, self.next_index)
if r in self.indexdict:
return self.indexdict[r]
Upvotes: 3