SantoshGupta7
SantoshGupta7

Reputation: 6197

How to remove items in a list of strings based on duplicate substrings among the elements?

I have a list of files from different paths, but some of that paths contain the same file(and file name).

I would like to remove these duplicate files, but since they're from different paths, I just can't do set(thelist)

Minimal Example

Say that my list looks like this

thelist = ['/path1/path2/file13332', '/path11/path21/file21', 'path1232/path1112/file13332', '/path1/path2/file13339']

What is the most pythonic way to get this

deduplicatedList = ['/path1/path2/file13332', '/path11/path21/file21', '/path1/path2/file13339']

File file13332 was in the list twice. I am not concerned about which element was removed

Upvotes: 0

Views: 46

Answers (2)

chash
chash

Reputation: 4433

s = set()
deduped = [s.add(os.path.basename(i)) or i for i in l if os.path.basename(i) not in s]

s contains the unique basenames which guards against adding non-unique basenames to deduped.

Upvotes: 1

ywbaek
ywbaek

Reputation: 3031

One way is to use dictionary:

thelist = ['/path1/path2/file13332', '/path11/path21/file21', 'path1232/path1112/file13332', '/path1/path2/file13339']

deduplicatedList = list({f.split('/')[-1]: f for f in thelist}.values())

print(deduplicatedList)
['path1232/path1112/file13332', '/path11/path21/file21', '/path1/path2/file13339']

Upvotes: 5

Related Questions