Reputation: 563
I am concerned about a question I have that is not so obvious to resolve.
If we have a dictionary such as:
dictionary = {"first":["a", "b", "c"],
"second":["a", "c", "d", "b", "e"],
"third":["a", "e", "f"],
"four":["f", "b", "c"],
"five":["b", "e", "g"]}
What is the most efficient way to remove the key-value pair for each key whose values are contained in other keys?
Where a value is contained in another subset of values.
dictionary = {"first":["a", "b", "c"], # a, b or c are all three contained in other subsets of values
"second":["a", "c", "d", "b", "e"], # d is not contained anywhere else
"third":["a", "e", "f"], #the three values of "third" and "four" are contained in other subsets of values, f is only in these two subset, one of the two pairs is deleted (so for exemple "four")
"four":["f", "b", "c"],
"five":["b", "e", "g"]} # g is not contained anywhere else
To have:
new_dictionary = {"second":["a", "c", "d", "b", "e"],
"third":["a", "e", "f"],
"five":["b", "e", "g"]}
Upvotes: 0
Views: 756
Reputation: 42143
You can get the desired result by processing your dictionary in decreasing order of list size. Collect the letter values in a set as you go to check if the values currently included in the result already contain all of the values for the next key.
dictionary = {"first":["a", "b", "c"],
"second":["a", "c", "d", "b", "e"],
"third":["a", "e", "f"],
"four":["f", "b", "c"],
"five":["b", "e", "g"]}
new_dict = { k:v for seen in [set()]
for k,v in sorted(dictionary.items(),key=lambda kv:-len(kv[1]))
if not (seen.issuperset(v),seen.update(v))[0] }
print(new_dict)
{'second': ['a', 'c', 'd', 'b', 'e'],
'third': ['a', 'e', 'f'],
'five': ['b', 'e', 'g']}
Upvotes: 1
Reputation: 24691
Based on my understanding of the problem, here's a naive approach:
import itertools
dictionary = {"first":["a", "b", "c"],
"second":["a", "c", "d", "b", "e"],
"third":["a", "e", "f"],
"four":["f", "b", "c"],
"five":["b", "e", "g"]}
# mutate destructively, so create a new dict so as to preserve the original
# we will remove keys while iterating over the new dictionary
# this is usually not recommended, but in this case we need to
# check against already-deleted values
new_dictionary = dict(dictionary)
# instead of iterating directly through new_dictionary.items(),
# pipe it through the list() constructor to force all the items out before
# we delete anything. If we were to pop an element before
# `new_dictionary.items()` got to it, then we might have glitchy
# behavior. This ensures that all the key-value pairs are taken before
# anything is removed
for key, value in list(new_dictionary.items()):
# take all other items currently in the dictionary (not popped yet),
# flatten their value lists using itertools.chain(), and then check
# if this value is a subset of that.
set_of_other_items = set(itertools.chain(*[
v for (k, v) in new_dictionary.items() if k != key
]))
if set(value).issubset(set_of_other_items):
new_dictionary.pop(key)
print(new_dictionary)
# {'five': ['b', 'e', 'g'],
# 'four': ['f', 'b', 'c'],
# 'second': ['a', 'c', 'd', 'b', 'e']}
This isn't super efficient (O(n^2)
so there are probably better ways), but it should work. Additionally, as of python 3.7, dict keys are returned from .items()
in the order they were inserted/defined, so the ordering should work out properly.
In this case, 'third'
is removed instead of 'four'
because it comes first. If you want to do the opposite, you can iterate over list(reversed(new_dictionary.items()))
instead.
Upvotes: 2
Reputation: 9857
Still not sure I follow exactly but this code will remove an item from the dictionary if all it's values are found in the values of the other items.
dictionary = {
"first": ["a", "b", "c"],
"second": ["a", "c", "d", "b", "e"],
"third": ["a", "e", "f"],
"four": ["f", "b", "c"],
"five": ["b", "e", "g"],
}
new_dic = {}
for ky, val in dictionary.items():
all_vals = [
item for sublist in [items for key, items in dictionary.items() if key !=ky] for item in sublist]
if not (all(x in all_vals for x in val)):
new_dic[ky] = val
print(new_dic)
Sample Output
{'second': ['a', 'c', 'd', 'b', 'e'], 'five': ['b', 'e', 'g']}
Upvotes: 1
Reputation: 73460
The following will work:
from collections import defaultdict
from functools import reduce
helper = defaultdict(set)
for k, vals in dictionary.items():
for val in vals:
helper[val].add(k)
{k: vals for k, vals in dictionary.items()
if len(reduce(set.intersection, map(helper.get, vals)))==1}
# {'second': ['a', 'c', 'd', 'b', 'e'],
# 'third': ['a', 'e', 'f'],
# 'four': ['f', 'b', 'c'],
# 'five': ['b', 'e', 'g']}
The helper
data structure stores sets of keys for every appearing value. The dict comprehension then filters for items with value lists for which there is exactly one such key.
Upvotes: 0