Reputation: 69
I have a dictionary like this:
Files:
{'key1': ['path1', 'path1', 'path2', 'path1', 'path2'],
'key2': ['f', 'f', 'f', 'f', 'f'],
'key_file': ['file1', 'file1', 'file2', 'file1', 'file2']}
I want to delete all the duplicate values und in 'key_file' and their other values in the other keys ('key1' and 'key2').
Desired dictionary:
Files:
{'key1': ['path1', 'path2'],
'key2': ['f', 'f'],
'key_file': ['file1', 'file2']}
I couldn't figure out a solution which preserved the order and deleted every duplicate item and their values in the other keys.
Thanks a lot.
EDIT:
'key2': ['f', 'f', 'f', 'f', 'f']
becomes
'key2': ['f', 'f'],
because there are two distinct files.
I don't want to delete every duplicate in every key. 'path1' is related to 'file1' and 'path2' is related to 'file2' as is the 'f' in key2 for both cases. Actually in reality there are several keys more, but this is my minimal example. That is my problem. I have found several solutions to delete every duplicate.
EDIT2:
Maybe I was a bit confusing.
Every key has the same length as they describe a filename (in key_file), the according path (in key1) and some other describing strings (in key2, etc). It can happen that the same file is stored in different locations (paths), but I know, that it is the same file if the filename is exactly the same.
Basically what I was looking for, is a function which detects the second value of key_file with the filename file1 as a duplicate of the first value file1 and deletes the second value from every key. The same for value number 4 (file1) and 5 (file2). The resulting dictionary would then look like the one I mentioned.
I hope this explains it better.
Upvotes: 0
Views: 2726
Reputation:
Here is my implementation:
In [1]: mydict = {'key1': ['path1', 'path1', 'path2', 'path1', 'path2'], 'key2': ['f', 'f', 'f', 'f', 'f'], 'key_file': ['file1', 'file1', 'file2', 'file1', 'file2']}
In [2]: { k: sorted(list(set(v))) for (k,v) in mydict.iteritems() }
Out[2]: {'key1': ['path1', 'path2'], 'key2': ['f'], 'key_file': ['file1', 'file2']}
Test
In [6]: mydict
Out[6]:
{'key1': ['path1', 'path1', 'path2', 'path1', 'path2'],
'key2': ['f', 'f', 'f', 'f', 'f'],
'key_file': ['file1', 'file1', 'file2', 'file1', 'file2']}
In [7]: uniq = { k: sorted(list(set(v))) for (k,v) in mydict.iteritems() }
In [8]: for key in uniq:
...: print 'KEY :', key
...: print 'VALUE :', uniq[key]
...: print '-------------------'
...:
KEY : key2
VALUE : ['f']
-------------------
KEY : key1
VALUE : ['path1', 'path2']
-------------------
KEY : key_file
VALUE : ['file1', 'file2']
-------------------
Upvotes: 0
Reputation: 16556
A naive approach: iterate over the keys and add to a new dict each values:
>>> newFiles={'key1': [], 'key2':[], 'key_file':[]}
>>> for i,j in enumerate(Files['key_file']):
... if j not in newFiles['key_file']:
... for key in newFiles.keys():
... newFiles[key].append(Files[key][i])
...
>>> newFiles
{'key2': ['1', '3'], 'key1': ['a', 'c'], 'key_file': ['file1', 'file2']}
with OrderedDict:
>>> for j in OrderedDict.fromkeys(Files['key_file']):
... i = Files['key_file'].index(j)
... if j not in newFiles['key_file']:
... for key in newFiles.keys():
... newFiles[key].append(Files[key][i])
...
>>> newFiles
{'key2': ['1', '3'], 'key1': ['a', 'c'], 'key_file': ['file1', 'file2']}
Note: if a "file" in key_file
always has the same key_1
and key_2
, there are better ways. For instance using zip
:
>>> z=zip(*Files.values())
>>> z
[('f', 'path1', 'file1'), ('f', 'path1', 'file1'), ('f', 'path2', 'file2'), ('f', 'path1', 'file1'), ('f', 'path2', 'file2')]
>>> OrderedDict.fromkeys(z)
OrderedDict([(('f', 'path1', 'file1'), None), (('f', 'path2', 'file2'), None)])
>>> list(OrderedDict.fromkeys(z))
[('f', 'path1', 'file1'), ('f', 'path2', 'file2')]
>>> zip(*OrderedDict.fromkeys(z))
[('file1', 'file2'), ('path1', 'path2'), ('f', 'f')]
Upvotes: 2
Reputation: 82899
As I understand the question, it seems that corresponding values in the different lists in the dictionary belong together, while values within the same list are unrelated to each other. In this case, I'd suggest using a different data structure. Instead of having a dictionary with three lists of items, you can make one list holding triplets.
>>> files = {'key1': ['path1', 'path1', 'path2', 'path1', 'path2'],
'key2': ['f', 'f', 'f', 'f', 'f'],
'key_file': ['file1', 'file1', 'file2', 'file1', 'file2']}
>>> files2 = set(zip(files["key1"], files["key2"], files["key_file"]))
>>> print files2
set([('path2', 'f', 'file2'), ('path1', 'f', 'file1')])
Or if you want to make it more dictionary-like, you could do this, afterwards:
>>> files3 = [{"key1": k1, "key2": k2, "key_file": kf} for k1, k2, kf in files2]
>>> print files3
[{'key2': 'f', 'key1': 'path2', 'key_file': 'file2'},
{'key2': 'f', 'key1': 'path1', 'key_file': 'file1'}]
Note that the order of the triplets in the top-level list may be different, but items that belong together are still together in the contained tuples or dictionaries.
Upvotes: 0
Reputation: 107287
You can use collections.OrderedDict
to keep your dictionary in order and set
to remove the duplicates :
>>> d={'key1': ['path1', 'path1', 'path2', 'path1', 'path2'],
... 'key2': ['f', 'f', 'f', 'f', 'f'],
... 'key_file': ['file1', 'file1', 'file2', 'file1', 'file2']}
>>> from collections import OrderedDict
>>> OrderedDict(sorted([(i,list(set(j))) for i,j in d.items()], key=lambda t: t[0]))
OrderedDict([('key1', ['path2', 'path1']), ('key2', ['f']), ('key_file', ['file2', 'file1'])])
you need to use set
for values to remove duplicates then sort your items based on keys and finally to keep your dictionary in sort use OrderedDict
.
Edit : if you want to all values have the same length as max value use the following :
>>> s=sorted([(i,list(set(j))) for i,j in d.items()], key=lambda t: t[0])
>>> M=max(map(len,[i[1] for i in s])
>>> f_s=[(i,j) if len(j)==M else (i,[j[0] for t in range(M)]) for i,j in s]
>>> f_s
[('key1', ['path2', 'path1']), ('key2', ['f', 'f']), ('key_file', ['file2', 'file1'])]
>>> OrderedDict(f_s)
OrderedDict([('key1', ['path2', 'path1']), ('key2', ['f', 'f']), ('key_file', ['file2', 'file1'])])
but if you just want the first 2 element of any values you can use slicing :
>>> OrderedDict(sorted([(i,j[:2]) for i,j in d.items()],key=lambda x: x[0])
... )
OrderedDict([('key1', ['path1', 'path1']), ('key2', ['f', 'f']), ('key_file', ['file1', 'file1'])])
Upvotes: 1
Reputation: 52071
OrderedDict
is the best as it preserves order
You can add it to a set and then make it a list
Example
for i in d:
d[i] = list(set(d[i]))
Upvotes: 1