How to remove items in a list of strings based on duplicate substrings among the elements?

Question

I have a list of files from different paths, but some of that paths contain the same file(and file name).

I would like to remove these duplicate files, but since they're from different paths, I just can't do set(thelist)

Minimal Example

Say that my list looks like this

thelist = ['/path1/path2/file13332', '/path11/path21/file21', 'path1232/path1112/file13332', '/path1/path2/file13339']

What is the most pythonic way to get this

deduplicatedList = ['/path1/path2/file13332', '/path11/path21/file21', '/path1/path2/file13339']

File file13332 was in the list twice. I am not concerned about which element was removed

ywbaek · Accepted Answer

One way is to use dictionary:

thelist = ['/path1/path2/file13332', '/path11/path21/file21', 'path1232/path1112/file13332', '/path1/path2/file13339']

deduplicatedList = list({f.split('/')[-1]: f for f in thelist}.values())

print(deduplicatedList)

['path1232/path1112/file13332', '/path11/path21/file21', '/path1/path2/file13339']

How to remove items in a list of strings based on duplicate substrings among the elements?

Minimal Example

Answers (2)

Related Questions