1pa
1pa

Reputation: 735

Intersection between values in python dictionaries (faster way)

I'm trying to do a txt file with a result from an intersection of two dictionaries. I was searching and I know the best way to do this is with intersection of keys but with my dictionaries I can not do this.

Example of my dictionaries:

dA = {'1':'aaa','2':'aaa','3':'bbb'}
dB = {'10':'aaa','11':'aaa','12':'bbb'}

And this is the output I need inside the txt file:

1 10
1 11
2 10
2 11
3 12

Note: my dictionaries have ~100.000.000 entries each

This is my code:

>>> for key, value in da.items():
...     for bkey, bvalue in db.items():
...             if bvalue == value:
...                     print(key, bkey)

Upvotes: 1

Views: 1045

Answers (1)

Kenan Banks
Kenan Banks

Reputation: 211980

A faster method, which produces unsorted output.

from itertools import product
from collections import defaultdict

da = {'1':'aaa','2':'aaa','3':'bbb'}
db = {'10':'aaa','11':'aaa','12':'bbb'}

def gen_matches():
  map_a = defaultdict(list)
  map_b = defaultdict(list)

  for key, value in da.items():
    map_a[value].append(key)
  for key, value in db.items():
    map_b[value].append(key)

  for key in map_a:
    if key in map_b:
      for x in product(map_a[key], map_b[key]):
        yield x

for match in gen_matches():
  print(match)

Output

('2', '11')
('2', '10')
('1', '11')
('1', '10')
('3', '12')

This is O(n+m), which means it only needs to look at each element in each dictionary one time. We call the size of dictionary A "n" and the size of dictionary B "m".

The original method, is O(n*m). Every time you look at an element of A, you look at every other element in B.

So, you can get an idea of how long the two approaches will take by substituting numbers. If dicts A and B both contain 1000 elements, this version will take about 2000 time units, and the original would take 1,000,000!.

Big-O notation is a way to estimate the complexity of an algorithm. I've linked you to a good beginner's guide; the wikipedia article is unfortunately rather hard to read.

Upvotes: 4

Related Questions