Oleg Dats
Oleg Dats

Reputation: 4133

What is the most efficient way to add keys with empty values to dictionary?

I use ordered_set = dict() to create ordered set. The values are not important. Test is whether the key was already added only. Ordered set is a set of unique values where the order of adding is maintained.

if node not in ordered_set:
  ordered_set[node] = None # ordered_set[node] = False or ordered_set[node] = '' or ordered_set[node] = 0
  f1()
else:
  f2()
      

What is the most efficient way to add keys with empty values to dictionary? Maybe there is a better way to implement ordered set?

Upvotes: 0

Views: 286

Answers (1)

E. Ducateme
E. Ducateme

Reputation: 4248

This is an approach. In later versions of Python (3.6+, I think), the order with which keys are added to a dictionary is maintained (the order of the keys in the dictionary could not be assured in earlier versions of Python).

The method .setdefault() is built into Python dictionaries and is designed for the use case you are faced with: check to see if a key exists and if not, create the key and set a default value.

mydict = {1: True, 2:True, 5: True}

for key in [1, 2, 3, 4, 5]:
    mydict.setdefault(key, None)

yields

{1: True, 2: True, 5: True, 3: None, 4: None}

In the above, notice that the order of the keys is maintained ... the first three keys already existed and other keys (3 and 4) are added in a specific order and the order is maintained.

Because the default behavior of .setdefault() is to set the new key to None automagically, so we can simplify the code in this way:

for key in [1, 2, 3, 4, 5]:
    mydict.setdefault(key)

Measuring efficiency:

One of the comments raises a great point about "efficiency" and not gonna lie, my approach was aimed at the developer's efficiency (i.e. using built in tools to keep code shorter, simpler, easier to understand). But let's look at what is happening under the hood in terms of code efficiency (i.e. time to compute):

Here is a simple benchmark that shows the relative speed of computation. We start by making mydict and a version of mylist with a much larger number of entries, most of which will not be present in the dict.

In [26]: %%timeit mydict = {1: True, 2:True}; mylist = list(range(1000000))
    ...: for key in mylist:
    ...:     if key not in mydict:
    ...:         mydict[key] = None
    ...: 
29.6 ms ± 613 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [28]: %%timeit mydict = {1: True, 2:True}; mylist = list(range(1000000))
    ...: for key in mylist:
    ...:     mydict.setdefault(key)
    ...: 
49.8 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

We see that for the code that tests for the presence of the key in mydict first, we get a shorter time to complete. If we use the builtin method .setdefault() we see a longer time to complete (potentially because the function call might be expensive).

So in terms of raw time to compute, the original poster's method appears more efficient in terms of time and the use of the builtin .setdefault() might be considered by some to be more efficient in terms of developer time/mental effort. YMMV.

Upvotes: 2

Related Questions