Reputation: 96302
Say I have a dictionary of lists in Python. I would like to find all the groups of keys that have items in common, and for each such group, the corresponding items.
For example, assuming that the items are simple integers:
dct = dict()
dct['a'] = [0, 5, 7]
dct['b'] = [1, 2, 5]
dct['c'] = [3, 2]
dct['d'] = [3]
dct['e'] = [0, 5]
The groups would be:
groups = dict()
groups[0] = ['a', 'e']
groups[1] = ['b', 'c']
groups[2] = ['c', 'd']
groups[3] = ['a', 'b', 'e']
And the elements in common for those groups would be:
common = dict()
common[0] = [0, 5]
common[1] = [2]
common[2] = [3]
common[3] = [5]
To solve this problem, I believe that there is value in building a matrix like the one below, but I am not sure how to proceed from this point. Are there any Python libraries that facilitate solving this type of problem?
| a b c d e |
|a| x x |
|b| x x x |
|c| x x x |
|d| x x |
|e| x x x |
I tried to wrap up the solution that @NickBurns provided within a function, but I am having problems reproducing the solution:
dct = { 'a' : [0, 5, 7], 'b' : [1, 2, 5], 'c' : [3, 2], 'd' : [3], 'e' : [0, 5]}
groups, common_items = get_groups(dct)
print 'Groups', groups
print 'Common items', common_items
I get:
Groups: defaultdict(<type 'list'>, {0: ['a', 'e'], 2: ['c', 'b'], 3: ['c', 'd'], 5: ['a', 'b', 'e']})
Common items: {0: None, 2: None, 3: None, 5: None}
And here are the functions
from collections import defaultdict
def common(query_group, dct):
""" Recursively find the common elements within groups """
if len(query_group) <= 1:
return
# Extract the elements from groups,
# Pull their original values from dct
# Get the intersection of these
first, second = set(dct[query_group[0]]), set(dct[query_group[1]])
# print(first.intersection(second))
return common(query_group[2:], dct)
def get_groups (dct):
groups = defaultdict(list)
for key, values in dct.items():
for value in values:
groups[value].append(key)
# Clean up the groups:
for key in groups.keys():
# i.e. the value is common to more than 1 group
if len(groups[key]) <= 1:
del groups[key]
# Identify common elements:
common_items = dict()
for k,v in groups.iteritems():
if len(v) > 1:
common_items[k] = common(v, dct)
return groups, common_items
Upvotes: 2
Views: 1701
Reputation: 983
I would try to create a second dictionary (groups
) that represents the intersection of each list in the original dct
. For example, youu could do this using a defaultdict something like:
from collections import defaultdict
groups = defaultdict(list)
dct = { 'a' : [0, 5, 7], 'b' : [1, 2, 5], 'c' : [3, 2], 'd' : [3], 'e' : [0, 5]}
for key, values in dct.items():
for value in values:
groups[value].append(key)
for key in groups.keys():
if len(groups[key]) > 1: # i.e. the value is common to more than 1 group
print(key, groups[key])
(0, ['a', 'e'])
(2, ['c', 'b'])
(3, ['c', 'd'])
(5, ['a', 'b', 'e'])
Finding the common elements is a little messier, you need to run through each group and find the intersection from the original dct
. Perhaps a recursive routine like this would work:
def common(query_group, dct, have_common=[]):
""" Recursively find the common elements within groups """
if len(query_group) <= 1:
return have_common
# extract the elements from groups, and pull their original values from dct
# then get the intersection of these
first, second = set(dct[query_group[0]]), set(dct[query_group[1]])
have_common.extend(first.intersection(second))
return common(query_group[2:], dct, have_common)
for query_group in groups.values():
if len(query_group) > 1:
print(query_group, '=>', common(query_group, dct, have_common=[]))
['e', 'a'] => [0, 5]
['b', 'c'] => [2]
['d', 'c'] => [3]
['e', 'b', 'a'] => [5}]
Clearly it needs some prettier formatting, but I think it gets the job done. Hopefully that helps.
Upvotes: 3
Reputation: 67427
This is a big mess, but it works. It is basically building an array like this:
| 0 1 2 3 4 5 6 7 |
+-----------------+
|a| 1 0 0 0 1 0 0 1 |
|b| 0 1 1 0 0 1 0 0 |
|c| 0 0 1 1 0 0 0 0 |
|d| 0 0 0 1 0 0 0 0 |
|e| 1 0 0 0 0 1 0 0 |
The groups are the unique columns with more than one 1
. To find all common elements to a group, you find the columns that have 1
s in the places where the group definition has 1
s. And writing it in Python, with a hacky use of scipy's sparse matrices to build the above array, I got the following:
import numpy as np
import scipy.sparse as sps
dct = {'a' : [0, 5, 7], 'b' : [1, 2, 5], 'c' : [3, 2],
'd' : [3], 'e' : [0, 5]}
keys = []
lens = []
vals = []
for key, items in dct.items():
keys.append(key)
lens.append(len(items))
vals.extend(items)
keys = np.array(keys)
lens = np.array(lens)
vals = np.array(vals)
unique_values, val_idx = np.unique(vals, return_inverse=True)
data = np.ones_like(val_idx)
indices = val_idx
indptr = np.concatenate(([0], np.cumsum(lens)))
dct_array = sps.csr_matrix((data, indices, indptr))
dct_array = dct_array.T.toarray()
mask = dct_array.sum(axis=-1) >= 2
dct_array = dct_array[mask].astype(np.bool)
unique_values = unique_values[mask]
dct_array = np.ascontiguousarray(dct_array)
dct_array = dct_array.view((np.void,
(dct_array.dtype.itemsize *
len(keys)))).ravel()
groups, grp_idx = np.unique(dct_array,
return_index=True)
groups = groups.view(np.bool).reshape(-1, len(keys))
dct_array = dct_array.view(np.bool).reshape(-1, len(keys))
for group, idx in zip(groups, grp_idx) :
print 'group {0}'.format(keys[group])
common = unique_values[np.all(np.logical_and(dct_array[idx],
dct_array) ==
dct_array[idx], axis=-1)]
print 'common {0}'.format(common)
This prints out:
group ['c' 'd']
common [3]
group ['c' 'b']
common [2]
group ['a' 'e']
common [0 5]
group ['a' 'b' 'e']
common [5]
Upvotes: 1
Reputation: 14098
This is pretty close to what you're asking for - take a look at it, and see if it's close enough.
from collections import defaultdict
dct = dict()
dct['a'] = [0, 5, 7]
dct['b'] = [1, 2, 5]
dct['c'] = [3, 2]
dct['d'] = [3]
dct['e'] = [0, 5]
inverseDict = defaultdict(list)
for key in dct:
for item in dct[key]:
inverseDict[item].append(key)
for item in inverseDict.keys():
if len(inverseDict[item]) < 2:
del inverseDict[item]
for item in inverseDict:
print item, ":", inverseDict[item]
Output:
0 : ['a', 'e']
2 : ['c', 'b']
3 : ['c', 'd']
5 : ['a', 'b', 'e']
Upvotes: 2
Reputation: 250971
You can use the NetworkX library to get that matrix(Adjacency matrix) representation:
import networkx as nx
dct = { 'a' : [0, 5, 7], 'b' : [1, 2, 5], 'c' : [3, 2], 'd' : [3], 'e' : [0, 5]}
nodes = sorted(dct)
G = nx.Graph()
for node in nodes:
attached_nodes = dct[node]
G.add_node(node)
for nod in attached_nodes:
if 0 <= nod < len(nodes):
G.add_edge(node, nodes[nod])
print G.nodes()
print G.edges()
print G.has_edge('a','b')
print G.has_edge('b','c')
Output:
['a', 'c', 'b', 'e', 'd']
[('a', 'a'), ('a', 'e'), ('c', 'c'), ('c', 'b'), ('c', 'd'), ('b', 'b'), ('d', 'd')]
False
True
Upvotes: 2