lawful_neutral
lawful_neutral

Reputation: 673

Reassign dictionary values

I have a dictionary like

{'A': 0, 'B': 1, 'C': 2, 'D': 3, etc}

How can I remove elements from this dictionary without creating gaps in values, in case the dictionary is not ordered?

An example:

I have a big matrix, where rows represent words, and columns represent documents where these words are encountered. I store the words and their corresponding indices as a dictionary. E.g. for this matrix

2 0 0
1 0 3
0 5 1
4 1 2

the dictionary would look like:

words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}

If I remove the words 'apple' and 'banana', the matrix would contain only two rows. So the value of 'orange' in the dictionary should now equal 0 and not 1, and the value of 'pear' should be 1 instead of 3.

In Python 3.6+ dictionaries are ordered, so I can just write something like this to reassign the values:

i = 0
for k, v in words.items():
  v = i
  i += 1

or, alternatively

words = dict(zip(terms.keys(), range(0, matrix.shape[0])))

I think, this is far from being the most efficient way to change the values, and it wouldn't work with unordered dictionaries. How to do it efficiently? Is there any way to easily reassign the values in case the dictionary is not ordered?

Upvotes: 10

Views: 2913

Answers (5)

user9885031
user9885031

Reputation:

Initially we have:

words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}

To reorder from minimum to maximum, you may use sorted and dictionary comprehension:

std = sorted(words, key=lambda x: words[x]) 
newwords = {word:std.index(word) for word in std}

Upvotes: 3

Uri Goren
Uri Goren

Reputation: 13690

You are using the wrong tool (dict) for the job, you should use a list

class vocabulary:
    def __init__(self, *words):
        self.words=list(words)
    def __getitem__(self, key):
        try:
            return self.words.index(key)
        except ValueError:
            print (key + " is not in vocabulary")
    def remove(self, word):
        if type(word)==int:
            del self.words[word]
            return
        return self.remove(self[word])

words = vocabulary("apple" ,"banana", "orange")
print (words["banana"]) # outputs 1
words.remove("apple")
print (words["banana"]) # outputs 0

A note on complexity

I had several comments mentioning that a dict is more efficient because it's lookup time is O(1) and the lookup time of a list is O(n).

This is simply not true in this case.

The O(1) guarantee of a hash table (dict in python), is a result of an amortised complexity, meaning, that you average a common usage of lookup table that is generated once, assuming that your hash function is balanced.

This amortised calculation does not take into account deleting the entire dictionary and regenerating it every time you remove an item, as some of the other answers suggest.

The list implementation and the dict implementation have the same worst-case complexity of O(n).

Yet, the list implementation could be optimised with two lines of python (bisect) to have a worst-case complexity of O(log(n))

Upvotes: 2

iacob
iacob

Reputation: 24231

You can use your existing logic, using a representation of the dictionary that is sorted:

import operator

words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}
sorted_words = sorted(words.items(), key=operator.itemgetter(1))

for i, (k, v) in enumerate(sorted_words):
    words[k] = i

Upvotes: 3

RoadRunner
RoadRunner

Reputation: 26315

You could always keep an inverted dictionary that maps indices to words, and use that as a reference for keeping the order of the original dictionary. Then you could remove the words, and rebuild the dictionary again:

words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}

# reverse dict for index -> word mappings
inverted = {i: word for word, i in words.items()}

remove = {'apple', 'banana'}

# sort/remove the words
new_words = [inverted[i] for i in range(len(inverted)) if inverted[i] not in remove]

# rebuild new dictionary
new_dict = {word: i for i, word in enumerate(new_words)}

print(new_dict)

Which Outputs:

{'orange': 0, 'pear': 1}

Note: Like the accepted answer, this is also O(n).

Upvotes: 3

Aran-Fey
Aran-Fey

Reputation: 43196

Turn the dict into a sorted list and then build a new dict without the words you want to remove:

import itertools

to_remove = {'apple', 'banana'}

# Step 1: sort the words
ordered_words = [None] * len(words)
for word, index in words.items():
    ordered_words[index] = word
# ordered_words: ['apple', 'orange', 'banana', 'pear']

# Step 2: Remove unwanted words and create a new dict
counter = itertools.count()
words = {word: next(counter) for word in ordered_words if word not in to_remove}
# result: {'orange': 0, 'pear': 1}

This has a runtime of O(n) because manually ordering the list with indexing operations is a linear operation, as opposed to sorted which would be O(n log n).

See also the documentation for itertools.count and next.

Upvotes: 8

Related Questions