Reputation: 7922
In this answer, I am creating a dict
of empty set
s by iterating through a list. Then I iterate through the same list & filling those sets. An MRE:
# imports we need
import time
import numpy as np
np.random.seed(42)
An example list of letters. Note that at least one letter will appear more than once.
letters=[np.random.choice([letter for letter in string.ascii_lowercase]) for _ in range(1000)]
Result:
['w', 'n', 'k', 'o', 'm', 'r', ...
Creating a dict
with letters as keys, empty sets as values:
letterdict={letter:set() for letter in letters}
Iterating through the letters
list again, each entry in the list with the corresponding letter will be a set where the indices of that letter appears in the letters
list:
for index, letter in enumerate(letters):
letterdict[letter].add(index)
letterdict
will look like:
{'w': {0, 12, 62, 67, 69, ...
This process took:
start = time.time()
letterdict={letter:set() for letter in letters}
for index, letter in enumerate(letters):
letterdict[letter].add(index)
end = time.time()
print(end-start)
0.000538...
sec.
Is there a way to make the creation of letterdict
quicker? Afterall, I am iterating through letters
twice.
My thoughts: If I could make it in one loop, when it encounters a letter for the first time, I could create a set
, and put the index of the letter in it. When encountering the letter for the second time, it could not reset the set
, just add the index. However, checking that a letter is already encountered or not seems tedious (ie slow).
In the MRE, assume, that we do not know what all the letters are, so replacing the first loop by {letter:set() for letter in string.ascii_lowercase}
is not really useful.
Upvotes: 0
Views: 183
Reputation: 11681
You probably want a collections.defaultdict
.
This will create the empty set on lookup if the key is not already present:
from collections import defaultdict
letterdict = defaultdict(set)
for index, letter in enumerate(letters):
letterdict[letter].add(index)
This way you don't have to initialize the dictionary with empty sets.
You could instead just use the .setdefault()
method on a normal dict.
letterdict = {}
for index, letter in enumerate(letters):
letterdict.setdefault(letter, set()).add(index)
This has the overhead of creating a new set object each time whether it's needed or not, but the builtin set()
types seem to construct pretty quickly. It wasn't any slower on the small samples I used to time it. CPython might pool these somehow when they get deleted quickly, like it does for tuples.
Upvotes: 1
Reputation: 1644
Okay, so I think there is no significant gain in time with this method, but you can do the same job with one loop, by using a try except block.
start = time.time()
for index, letter in enumerate(letters):
try:
letterdict[letter].add(index)
except:
letterdict[letter] = {index}
end = time.time()
print(end-start)
Basically, we create a set while looping though the list, therefore no need for another loop.
My time, first is your method, second is mine:
0.0009996 #Your method
0.0009992 #My method
As I said, no significant gain, but according to several runs, it slightly faster.
Upvotes: 1