Reputation: 735
I'm trying to do a txt file with a result from an intersection of two dictionaries. I was searching and I know the best way to do this is with intersection of keys but with my dictionaries I can not do this.
Example of my dictionaries:
dA = {'1':'aaa','2':'aaa','3':'bbb'}
dB = {'10':'aaa','11':'aaa','12':'bbb'}
And this is the output I need inside the txt file:
1 10
1 11
2 10
2 11
3 12
Note: my dictionaries have ~100.000.000 entries each
This is my code:
>>> for key, value in da.items():
... for bkey, bvalue in db.items():
... if bvalue == value:
... print(key, bkey)
Upvotes: 1
Views: 1045
Reputation: 211980
A faster method, which produces unsorted output.
from itertools import product
from collections import defaultdict
da = {'1':'aaa','2':'aaa','3':'bbb'}
db = {'10':'aaa','11':'aaa','12':'bbb'}
def gen_matches():
map_a = defaultdict(list)
map_b = defaultdict(list)
for key, value in da.items():
map_a[value].append(key)
for key, value in db.items():
map_b[value].append(key)
for key in map_a:
if key in map_b:
for x in product(map_a[key], map_b[key]):
yield x
for match in gen_matches():
print(match)
Output
('2', '11')
('2', '10')
('1', '11')
('1', '10')
('3', '12')
This is O(n+m), which means it only needs to look at each element in each dictionary one time. We call the size of dictionary A "n" and the size of dictionary B "m".
The original method, is O(n*m). Every time you look at an element of A, you look at every other element in B.
So, you can get an idea of how long the two approaches will take by substituting numbers. If dicts A and B both contain 1000 elements, this version will take about 2000 time units, and the original would take 1,000,000!.
Big-O notation is a way to estimate the complexity of an algorithm. I've linked you to a good beginner's guide; the wikipedia article is unfortunately rather hard to read.
Upvotes: 4