Reputation: 6197
I have a list of files from different paths, but some of that paths contain the same file(and file name).
I would like to remove these duplicate files, but since they're from different paths, I just can't do set(thelist)
Say that my list looks like this
thelist = ['/path1/path2/file13332', '/path11/path21/file21', 'path1232/path1112/file13332', '/path1/path2/file13339']
What is the most pythonic way to get this
deduplicatedList = ['/path1/path2/file13332', '/path11/path21/file21', '/path1/path2/file13339']
File file13332 was in the list twice. I am not concerned about which element was removed
Upvotes: 0
Views: 46
Reputation: 4433
s = set()
deduped = [s.add(os.path.basename(i)) or i for i in l if os.path.basename(i) not in s]
s
contains the unique basenames which guards against adding non-unique basenames to deduped
.
Upvotes: 1
Reputation: 3031
One way is to use dictionary:
thelist = ['/path1/path2/file13332', '/path11/path21/file21', 'path1232/path1112/file13332', '/path1/path2/file13339']
deduplicatedList = list({f.split('/')[-1]: f for f in thelist}.values())
print(deduplicatedList)
['path1232/path1112/file13332', '/path11/path21/file21', '/path1/path2/file13339']
Upvotes: 5