I need a data structure that supports FAST insertion and deletion of (key, value) pairs, as well as "get random key", which does the same thing as random.choice(dict.keys()) for a dictionary. I've searched on the internet, and most people seem to be satisfied with the random.choice(dict.keys()) approach, despite it being linear time. I'm aware that implementing this faster is possible : I could use a resizing hash table. If I maintain that the ratio of keys to slots is between 1 and 2, then I can just choose random indices until I hit a non-empty slot. I only look at 1 to 2 keys, in expectation. I can get these operations in guaranteed worst case O(log n) using an AVL tree, augmenting with rank. Is there any easy way to get this in Python, though? It seems like there should be!

Reputation: 601

Python get random key in a dictionary in O(1)

I need a data structure that supports FAST insertion and deletion of (key, value) pairs, as well as "get random key", which does the same thing as random.choice(dict.keys()) for a dictionary. I've searched on the internet, and most people seem to be satisfied with the random.choice(dict.keys()) approach, despite it being linear time.

I'm aware that implementing this faster is possible:

I could use a resizing hash table. If I maintain that the ratio of keys to slots is between 1 and 2, then I can just choose random indices until I hit a non-empty slot. I only look at 1 to 2 keys, in expectation.
I can get these operations in guaranteed worst case O(log n) using an AVL tree, augmenting with rank.

Is there any easy way to get this in Python, though? It seems like there should be!

Upvotes: 21

Answers (4)

brovia

Reputation: 9

To get O(1) space you need an array data structure and a dictionary that stores values in the array and their indices.

Then when adding values you simply push them onto your array and dictionary with its index in the array.

Then you have acces randomly since you are using array data structure.

When removing values you look at the index of the value you want to remove in the dictionary. Then replace that value in the array with the last value in the array (make sure it is not the last element already) and pop() the last value in the array. After that you update the key of the replaced value (last value in array) in the dictionary with the deleted value index. Finally you delete the key and value of the value you want to remove since it doesnt make sense it to be on the dictionary.

class RandomizedSet:

    def __init__(self):
        self.container = []
        self.indices = {}
       
        
    def insert(self, val: int) -> bool:
        if val in self.indices:
            return False
        
        self.indices[val] = len(self.container)
        self.container.append(val)
        return True

    def remove(self, val: int) -> bool:
        if val not in self.indices:
            return False
        
        idxOfValueToRemove = self.indices[val]
        lastValue = self.container[-1]
        
        if idxOfValueToRemove < len(self.container)-1:
            self.container[idxOfValueToRemove] = lastValue
            self.indices[lastValue] = idxOfValueToRemove
    
        self.container.pop()
        
        del self.indices[val]
    
        return True
        
        
            

    def getRandom(self) -> int:
         return random.choice(list(self.container))

Upvotes: 1

natevw

Reputation: 17902

This may not specifically relevant to the specific use case listed above, but this is the question I get when searching for a way to nicely get a hold of "any" key in a dictionary.

If you don't need a truly random choice, but just need some arbitrary key, here are two simple options I've found:

key = next(iter(d))    # may be a little expensive, but presumably O(1)

The second is really useful only if you're happy to consume the key+value from the dictionary, and due to the mutation(s) will not be as algorithmically efficient:

key, value = d.popitem()     # may not be O(1) especially if next step
if MUST_LEAVE_VALUE:
    d[key] = value

Upvotes: 5

ninjagecko

Reputation: 91092

[edit: Completely rewritten, but keeping question here with comments intact.]

Below is the realization of a dictionary wrapper with O(1) get/insert/delete, and O(1) picking of a random element.

The main idea is that we want to have an O(1) but arbitrary map from range(len(mapping)) to the keys. This will let us get random.randrange(len(mapping)), and pass it through the mapping.

This is very difficult to implement until you realize that we can take advantage of the fact that the mapping can be arbitrary. The key idea to achieve a hard bound of O(1) time is this: whenever you delete an element, you swap it with the highest arbitrary-id element, and update any pointers.

class RandomChoiceDict(object):
    def __init__(self):
        self.mapping = {}  # wraps a dictionary
                           # e.g. {'a':'Alice', 'b':'Bob', 'c':'Carrie'}

        # the arbitrary mapping mentioned above
        self.idToKey = {}  # e.g. {0:'a', 1:'c' 2:'b'}, 
                           #      or {0:'b', 1:'a' 2:'c'}, etc.

        self.keyToId = {}  # needed to help delete elements

Get, set, and delete:

    def __getitem__(self, key):  # O(1)
        return self.mapping[key]

    def __setitem__(self, key, value):  # O(1)
        if key in self.mapping:
            self.mapping[key] = value
        else: # new item
            newId = len(self.mapping)

            self.mapping[key] = value

            # add it to the arbitrary bijection
            self.idToKey[newId] = key
            self.keyToId[key] = newId

    def __delitem__(self, key):  # O(1)
        del self.mapping[key]  # O(1) average case
                               # see http://wiki.python.org/moin/TimeComplexity

        emptyId = self.keyToId[key]
        largestId = len(self.mapping)  # about to be deleted
        largestIdKey = self.idToKey[largestId]  # going to store this in empty Id

        # swap deleted element with highest-id element in arbitrary map:
        self.idToKey[emptyId] = largestIdKey
        self.keyToId[largestIdKey] = emptyId

        del self.keyToId[key]
        del self.idToKey[largestId]

Picking a random (key,element):

    def randomItem(self):  # O(1)
        r = random.randrange(len(self.mapping))
        k = self.idToKey[r]
        return (k, self.mapping[k])

Upvotes: 5

Matt

Reputation: 22123

Here is a somewhat convoluted approach:

Assign an index to each key, storing it with the value in the dictionary.
Keep an integer representing the next index (let's call this next_index).
Keep a linked list of removed indices (gaps).
Keep a dictionary mapping the indices to keys.
When adding a key, check the use (and remove) the first index in the linked list as the index, or if the list is empty use and increment next_index. Then add the key, value, and index to the dictionary (dictionary[key] = (index, value)) and add the key to the index-to-key dictionary (indexdict[index] = key).
When removing a key, get the index from the dictionary, remove the key from the dictionary, remove the index from the index-to-key dictionary, and insert the index to the front of the linked list.
To get a random key, get a random integer using something like random.randrange(0, next_index). If the index is not in the key-to-index dictionary, re-try (this should be rare).

Here is an implementation:

import random

class RandomDict(object):
    def __init__(self): # O(1)
        self.dictionary = {}
        self.indexdict = {}
        self.next_index = 0
        self.removed_indices = None
        self.len = 0

    def __len__(self): # might as well include this
        return self.len

    def __getitem__(self, key): # O(1)
        return self.dictionary[key][1]

    def __setitem__(self, key, value): # O(1)
        if key in self.dictionary: # O(1)
            self.dictionary[key][1] = value # O(1)
            return
        if self.removed_indices is None:
            index = self.next_index
            self.next_index += 1
        else:
            index = self.removed_indices[0]
            self.removed_indices = self.removed_indices[1]
        self.dictionary[key] = [index, value] # O(1)
        self.indexdict[index] = key # O(1)
        self.len += 1

    def __delitem__(self, key): # O(1)
        index = self.dictionary[key][0] # O(1)
        del self.dictionary[key] # O(1)
        del self.indexdict[index] # O(1)
        self.removed_indices = (index, self.removed_indices)
        self.len -= 1

    def random_key(self): # O(log(next_item/len))
        if self.len == 0: # which is usually close to O(1)
            raise KeyError
        while True:
            r = random.randrange(0, self.next_index)
            if r in self.indexdict:
                return self.indexdict[r]

Upvotes: 3

Python get random key in a dictionary in O(1)

Answers (4)

Related Questions