Hubert Janczak
Hubert Janczak

Reputation: 25

Why is copying dictionaries so slow?

I was solving one of Advent of Code questions and it turned out that my runtime was way too bad. I was using a dictionary with 1 million entries in it as a kind of "linked hashmap" where I can access item and it's value is the next item it's pointing to, so I had a linked list with O(1) lookup time. At some point I had to copy this dict, whch I did with simple

new_dict = dict(dictionary)

After playing with my code a bit, I removed this copying part and just ran the entire code on one dictionary. It suddenly sped up my runtime by orders of magnitude! Why did this part of my code create such a bottleneck?

Upvotes: 0

Views: 539

Answers (1)

flakes
flakes

Reputation: 23634

This is actually kinda interesting. Both dictionary.copy() and dict(dictionary) will make a shallow copy of the data in dictionary, however, one seems to run significantly faster.

I'm not exactly sure why this is, but I felt it would be interesting to post my findings. Perhaps someone can enlighten me in the comments:

import timeit

data = {i: i for i in range(10000)}

print(timeit.timeit(lambda: data.copy(), number=20000))
print(timeit.timeit(lambda: dict(data), number=20000))
0.8178407099999999
3.9644210609999995

If I had to guess I would say that dict(dictionary) is going over each key value pair to reconstruct the hash table, and dictionary.copy() is probably just copying the hash table and skipping a lot of the work required.

I would guess this is because the dict(d) builtin can be passed any object which implements the dictionary interface and that interface might not use the same hash table implementation, requiring that it be reconstructed when copying data.

Upvotes: 3

Related Questions