Reputation: 3016
I have a list l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
I want to delete the elements that start with the same sub-string if they exist (in this case 'abcd'
and 'ghi'
).
N.B: in my situation, I know that the 'repeated' elements, if they exist, can be only 'abcd' or 'ghi'.
To delete them, I used this:
>>> l.remove('abcd') if ('abcdef' in l and 'abcd' in l) else l
>>> l.remove('ghi') if ('ghijklm' in l and 'ghi' in l) else l
>>> l
>>> ['abcdef', 'ghijklm', 'xyz', 'pqrs']
Is there a more efficient (or more automated) way to do this?
Upvotes: 3
Views: 108
Reputation:
You can do it in linear time and O(n*m²) memory (where m is the length of your elements):
prefixes = {}
for word in l:
for x in range(len(word) - 1):
prefixes[word[:x]] = True
result = [word for word in l if word not in prefixes]
Iterate over each word and create a dictionary of the first character of each word, then the first two characters, then three, all the way up to all the characters of the word except the last one. Then iterate over the list again and if a word appears in that dictionary it's a shorter subset of some other word in the list
Upvotes: 2
Reputation: 712
@Andrew Allen's way
l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
i=0
l = sorted(l)
while True:
try:
if l[i] in l[i+1]:
l.remove(l[i])
continue
i += 1
except:
break
print(l)
#['abcdef', 'ghijklm', 'pqrs', 'xyz']
Upvotes: 1
Reputation: 5329
The following code does what you described.
your_list = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
print("Original list: %s" % your_list)
helper_list = []
for element in your_list:
for element2 in your_list:
if element.startswith(element2) and element != element2:
print("%s starts with %s" % (element, element2))
print("Remove: %s" % element)
your_list.remove(element)
print("Removed list: %s" % your_list)
Output:
Original list: ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
abcdef starts with abcd
Remove: abcdef
ghijklm starts with ghi
Remove: ghijklm
Removed list: ['abcd', 'ghi', 'xyz', 'pqrs']
On the other hand, I think there is more simple solution and you can solve it with list comprehension if you want.
Upvotes: 1
Reputation: 51643
You can use
l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
if "abcdef" in l: # only 1 check for containment instead of 2
l = [x for x in l if x != "abcd"] # to remove _all_ abcd
# or
l = l.remove("abcd") # if you know there is only one abcd in it
This might be slightly faster (if you have far more elements then you show) because you only need to check once for "abcdef" - and then once untile the first/all of list for replacement.
>>> l.remove('abcd') if ('abcdef' in l and 'abcd' in l) else l
checks l
twice for its full size to check containment (if unlucky) and then still needs to remove something from it
DISCLAIMER:
If this is NOT proven, measured bottleneck or security critical etc. I would not bother to do it unless I have measurements that suggests this is the biggest timesaver/optimization of all code overall ... with lists up to some dozends/hundreds (tummy feeling - your data does not support any analysis) the estimated gain from it is negligable.
Upvotes: -1
Reputation: 323
Try this it will work
l =['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
for i in l:
for j in l:
if len(i)>len(j) and j in i:
l.remove(j)
Upvotes: 0
Reputation: 1105
l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
for a in l[:]:
for b in l[:]:
if a.startswith(b) and a != b:
l.remove(b)
print(l)
Output
['abcdef', 'ghijklm', 'xyz', 'pqrs']
Upvotes: 1