Combine python dictionaries that share values and keys

Question

I am doing some entity matching based on string edit distance and my results are a dictionary with keys (query string) and values [list of similar strings] based on some scoring criteria.

for example:

results = {
  'ben' : ['benj', 'benjamin', 'benyamin'],
  'benj': ['ben', 'beny', 'benjamin'],
  'benjamin': ['benyamin'],
  'benyamin': ['benjamin'],
  'carl': ['karl'],
  'karl': ['carl'],
}

Each value also has a corresponding dictionary item, for which it is the key (e.g. 'carl' and 'karl').

I need to combine the elements that have shared values. Choosing one value as the new key (lets say the longest string). In the above example I would hope to get:

results = {
  'benjamin': ['ben', 'benj', 'benyamin', 'beny', 'benjamin', 'benyamin'],
  'carl': ['carl','karl']
}

I have tried iterating through the dictionary using the keys, but I can't wrap my head around how to iterate and compare through each dictionary item and its list of values (or single value).

jpp · Accepted Answer

This is one solution using collections.defaultdict and sets.

The desired output is very similar to what you have, and can be easily manipulated to align.

from collections import defaultdict

results = {
  'ben' : ['benj', 'benjamin', 'benyamin'],
  'benj': ['ben', 'beny', 'benjamin'],
  'benjamin': 'benyamin',
  'benyamin': 'benjamin',
  'carl': 'karl',
  'karl': 'carl',
}

d = defaultdict(set)

for i, (k, v) in enumerate(results.items()):
    w = {k} | (set(v) if isinstance(v, list) else {v})
    for m, n in d.items():
        if not n.isdisjoint(w):
            d[m].update(w)
            break
    else:
        d[i] = w

result = {max(v, key=len): v for k, v in d.items()}

# {'benjamin': {'ben', 'benj', 'benjamin', 'beny', 'benyamin'},
#  'carl': {'carl', 'karl'}}

Credit to @IMCoins for the idea of manipulating v to w in second loop.

Explanation

There are 3 main steps:

Convert values into a consistent set format, including keys and values from original dictionary.
Cycle through this dictionary and add values to a new dictionary. If there is an intersection with some key [i.e. sets are not disjoint], then use that key. Otherwise, add to new key determined via enumeration.
Create result dictionary in a final transformation by mapping max length key to values.

Combine python dictionaries that share values and keys

Answers (2)

Related Questions