Reputation: 172
I have two defaultdict
:
defaultdict(<type 'list'>, {'a': ['OS', 'sys', 'procs'], 'b': ['OS', 'sys']})
defaultdict(<type 'list'>, {'a': ['OS', 'sys'], 'b': ['OS']})
How do I compare these two to get the count of values missing from each one.
For example I should get two values are missing from second defaultdict for key 'a'
and one missing from 'b'
.
Upvotes: 4
Views: 4275
Reputation: 44545
Here we present an alternate solution using collections.Counter
to track values, and we consider some edge cases concerning uncommon keys and values.
Code
import collections as ct
def compare_missing(d1, d2, verbose=False):
"""Return the count of missing values from dict 2 compared to dict 1."""
record = {}
for k in d1.keys() & d2.keys():
a, b = ct.Counter(d1[k]), ct.Counter(d2[k])
record[k] = a - b
if verbose: print(record)
return sum(v for c in record.values() for v in c.values())
Demo
dd0 = ct.defaultdict(list, {"a": ["OS", "sys", "procs"], "b": ["OS", "sys"]})
dd1 = ct.defaultdict(list, {"a": ["OS", "sys"], "b": ["OS"]})
compare_missing(dd0, dd1, True)
# {'a': Counter({'procs': 1}), 'b': Counter({'sys': 1})}
# 2
compare_missing(dd1, dd0, True)
# {'a': Counter(), 'b': Counter()}
# 0
Details
compare_missing()
will only iterate common keys. In the next example, even though a new key (c
) was added to dd1
, we get the same results as above:
dd2 = ct.defaultdict(list, {"a": ["OS", "sys"], "b": ["OS"], "c": ["OS"]})
compare_missing(dd0, dd2)
# 2
compare_missing(dd2, dd0)
# 0
If uncommon values or replicates are found (i.e. "admin"
and "OS"
in dd3[b]
respectively), these occurrences are counted as well:
dd3 = ct.defaultdict(list, {"a": ["OS", "sys"], "b": ["OS", "admin", "OS"]})
compare_missing(dd3, dd0, True)
# {'a': Counter(), 'b': Counter({'OS': 1, 'admin': 1})}
# 2
Upvotes: 1
Reputation: 310069
You should be able to use set differences to find (and count) missing elements most efficiently. If you're careful, you can even do this without adding items to the defaultdict
(and without assuming that the functions inputs are defaultdict
).
From there, it becomes just a matter of accumulating those results in a dictionary.
def compare_dict_of_list(d1, d2):
d = {}
for key, value in d1.items():
diff_count = len(set(value).difference(d2.get(key, [])))
d[key] = diff_count
return d
Upvotes: 1
Reputation: 121
If you just want the total number missing from the second default dict, you can iterate through the first dict and look at the set difference to figure out how many more things are in A relative to B.
If you define the dicts like this:
a = defaultdict(list, {'a': ['OS', 'sys', 'procs'], 'b': ['OS', 'sys']})
b = defaultdict(list, {'a': ['OS', 'sys'], 'b': ['OS']})
This will tell you how many are missing from dict B:
total_missing_inB = 0
for i in a:
diff = set(a[i]) - set(b[i])
total_missing_inB += len(diff)
And this will tell you how many are missing from dict A
total_missing_inA = 0
for i in b:
diff = set(b[i]) - set(a[i])
total_missing_inA += len(diff)
Upvotes: 0