Reputation: 942
I have the following Python 2.7 dictionary data structure (I do not control source data - comes from another system as is):
{112762853378: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4'], 'alias': ['www.example.com'] }, 112762853385: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4'], 'alias': ['www.example.com'] }, 112760496444: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4'] }, 112760496502: {'dst': ['10.122.195.34'], 'src': ['4.3.2.1'] }, 112765083670: ... }
The dictionary keys will always be unique. Dst, src, and alias can be duplicates. All records will always have a dst and src but not every record will necessarily have an alias as seen in the third record.
In the sample data either of the first two records would be removed (doesn't matter to me which one). The third record would be considered unique since although dst and src are the same it is missing alias.
My goal is to remove all records where the dst, src, and alias have all been duplicated - regardless of the key.
How does this rookie accomplish this?
Also, my limited understanding of Python interprets the data structure as a dictionary with the values stored in dictionaries... a dict of dicts, is this correct?
Upvotes: 43
Views: 143034
Reputation: 11
I solved it using compressed dictionary method:
dic = {112762853378:
{'dst': ['10.121.4.136'],
'src': ['1.2.3.4'],
'alias': ['www.example.com']
},
112762853385:
{'dst': ['10.121.4.136'],
'src': ['1.2.3.4'],
'alias': ['www.example.com']
},
112760496444:
{'dst': ['10.121.4.136'],
'src': ['1.2.3.4']
},
112760496502:
{'dst': ['10.122.195.34'],
'src': ['4.3.2.1']
}
}
result = {k:v for k,v in dic.items() if list(dic.values()).count(v)==1}
Upvotes: 1
Reputation: 16
I would just make a set of the list of keys then iterate over them into a new dict:
input_raw = {112762853378:
{'dst': ['10.121.4.136'],
'src': ['1.2.3.4'],
'alias': ['www.example.com']
},
112762853385:
{'dst': ['10.121.4.136'],
'src': ['1.2.3.4'],
'alias': ['www.example.com']
},
112760496444:
{'dst': ['10.121.4.136'],
'src': ['1.2.3.4']
},
112760496502:
{'dst': ['10.122.195.34'],
'src': ['4.3.2.1']
}
}
filter = list(set(list(input_raw.keys())))
fixedlist = {}
for i in filter:
fixedlist[i] = logins[i]
Upvotes: 0
Reputation: 1
example = {
'id1': {'name': 'jay','age':22,},
'id2': {'name': 'salman','age': 52,},
'id3': {'name':'Ranveer','age' :26,},
'id4': {'name': 'jay', 'age': 22,},
}
for item in example:
for value in example:
if example[item] ==example[value]:
if item != value:
key = value
del example[key]
print "example",example
Upvotes: -3
Reputation: 120608
Another reverse dict variation:
>>> import pprint
>>>
>>> data = {
... 112762853378:
... {'dst': ['10.121.4.136'],
... 'src': ['1.2.3.4'],
... 'alias': ['www.example.com']
... },
... 112762853385:
... {'dst': ['10.121.4.136'],
... 'src': ['1.2.3.4'],
... 'alias': ['www.example.com']
... },
... 112760496444:
... {'dst': ['10.121.4.136'],
... 'src': ['1.2.3.4']
... },
... 112760496502:
... {'dst': ['10.122.195.34'],
... 'src': ['4.3.2.1']
... },
... }
>>>
>>> keep = set({repr(sorted(value.items())):key
... for key,value in data.iteritems()}.values())
>>>
>>> for key in data.keys():
... if key not in keep:
... del data[key]
...
>>>
>>> pprint.pprint(data)
{112760496444L: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4']},
112760496502L: {'dst': ['10.122.195.34'], 'src': ['4.3.2.1']},
112762853378L: {'alias': ['www.example.com'],
'dst': ['10.121.4.136'],
'src': ['1.2.3.4']}}
Upvotes: 3
Reputation: 1319
dups={}
for key,val in dct.iteritems():
if val.get('alias') != None:
ref = "%s%s%s" % (val['dst'] , val['src'] ,val['alias'])# a simple hash
dups.setdefault(ref,[])
dups[ref].append(key)
for k,v in dups.iteritems():
if len(v) > 1:
for key in v:
del dct[key]
Upvotes: 2
Reputation: 27575
input_raw = {112762853378: {'dst': ['10.121.4.136'],
'src': ['1.2.3.4'],
'alias': ['www.example.com'] },
112762853385: {'dst': ['10.121.4.136'],
'src': ['1.2.3.4'],
'alias': ['www.example.com'] },
112760496444: {'dst': ['10.121.4.299'],
'src': ['1.2.3.4'] },
112760496502: {'dst': ['10.122.195.34'],
'src': ['4.3.2.1'] },
112758601487: {'src': ['1.2.3.4'],
'alias': ['www.example.com'],
'dst': ['10.121.4.136']},
112757412898: {'dst': ['10.122.195.34'],
'src': ['4.3.2.1'] },
112757354733: {'dst': ['124.12.13.14'],
'src': ['8.5.6.0']},
}
for x in input_raw.iteritems():
print x
print '\n---------------------------\n'
seen = []
for k,val in input_raw.items():
if val in seen:
del input_raw[k]
else:
seen.append(val)
for x in input_raw.iteritems():
print x
result
(112762853385L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.136'], 'alias': ['www.example.com']})
(112757354733L, {'src': ['8.5.6.0'], 'dst': ['124.12.13.14']})
(112758601487L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.136'], 'alias': ['www.example.com']})
(112757412898L, {'src': ['4.3.2.1'], 'dst': ['10.122.195.34']})
(112760496502L, {'src': ['4.3.2.1'], 'dst': ['10.122.195.34']})
(112760496444L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.299']})
(112762853378L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.136'], 'alias': ['www.example.com']})
---------------------------
(112762853385L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.136'], 'alias': ['www.example.com']})
(112757354733L, {'src': ['8.5.6.0'], 'dst': ['124.12.13.14']})
(112757412898L, {'src': ['4.3.2.1'], 'dst': ['10.122.195.34']})
(112760496444L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.299']})
The facts that this solution creates first a list input_raw.iteritems() (as in Andrew's Cox's answer) and requires a growing list seen are drawbacks.
But the first can't be avoided (using iteritems() doesn't work) and the second is less heavy than re-creating a list result.values() from growing list result for each turn of a loop.
Upvotes: 4
Reputation: 150997
One simple approach would be to create a reverse dictionary using the concatenation of the string data in each inner dictionary as a key. So say you have the above data in a dictionary, d
:
>>> import collections
>>> reverse_d = collections.defaultdict(list)
>>> for key, inner_d in d.iteritems():
... key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d)
... reverse_d[key_str].append(key)
...
>>> duplicates = [keys for key_str, keys in reverse_d.iteritems() if len(keys) > 1]
>>> duplicates
[[112762853385, 112762853378]]
If you don't want a list of duplicates or anything like that, but just want to create a duplicate-less dict, you could just use a regular dictionary instead of a defaultdict
and re-reverse it like so:
>>> for key, inner_d in d.iteritems():
... key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d)
... reverse_d[key_str] = key
>>> new_d = dict((val, d[val]) for val in reverse_d.itervalues())
Upvotes: 6
Reputation: 9502
from collections import defaultdict
dups = defaultdict(lambda : defaultdict(list))
for key, entry in data.iteritems():
dups[tuple(entry.keys())][tuple([v[0] for v in entry.values()])].append(key)
for dup_indexes in dups.values():
for keys in dup_indexes.values():
for key in keys[1:]:
if key in data:
del data[key]
Upvotes: 1
Reputation: 10988
You could go though each of the items (the key value pair) in the dictionary and add them into a result dictionary if the value was not already in the result dictionary.
input_raw = {112762853378:
{'dst': ['10.121.4.136'],
'src': ['1.2.3.4'],
'alias': ['www.example.com']
},
112762853385:
{'dst': ['10.121.4.136'],
'src': ['1.2.3.4'],
'alias': ['www.example.com']
},
112760496444:
{'dst': ['10.121.4.136'],
'src': ['1.2.3.4']
},
112760496502:
{'dst': ['10.122.195.34'],
'src': ['4.3.2.1']
}
}
result = {}
for key,value in input_raw.items():
if value not in result.values():
result[key] = value
print result
Upvotes: 53
Reputation: 110301
Since the way to find uniqueness in correspondences is exactly to use a dictionary, with the desired unique value being the key, the way to go is to create a reversed dict, where your values are composed as the key - then recreate a "de-reversed" dictionary using the intermediate result.
dct = {112762853378:
{'dst': ['10.121.4.136'],
'src': ['1.2.3.4'],
'alias': ['www.example.com']
},
112762853385:
{'dst': ['10.121.4.136'],
'src': ['1.2.3.4'],
'alias': ['www.example.com']
},
112760496444:
{'dst': ['10.121.4.136'],
'src': ['1.2.3.4']
},
112760496502:
{'dst': ['10.122.195.34'],
'src': ['4.3.2.1']
},
}
def remove_dups (dct):
reversed_dct = {}
for key, val in dct.items():
new_key = tuple(val["dst"]) + tuple(val["src"]) + (tuple(val["alias"]) if "alias" in val else (None,) )
reversed_dct[new_key] = key
result_dct = {}
for key, val in reversed_dct.items():
result_dct[val] = dct[val]
return result_dct
result = remove_dups(dct)
Upvotes: 2