singrium
singrium

Reputation: 3016

delete the second item that starts with the same substring

I have a list l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
I want to delete the elements that start with the same sub-string if they exist (in this case 'abcd' and 'ghi').
N.B: in my situation, I know that the 'repeated' elements, if they exist, can be only 'abcd' or 'ghi'.
To delete them, I used this:

>>> l.remove('abcd') if ('abcdef' in l and 'abcd' in l) else l
>>> l.remove('ghi') if ('ghijklm' in l and 'ghi' in l) else l
>>> l
>>> ['abcdef', 'ghijklm', 'xyz', 'pqrs']

Is there a more efficient (or more automated) way to do this?

Upvotes: 3

Views: 108

Answers (6)

user3064538
user3064538

Reputation:

You can do it in linear time and O(n*m²) memory (where m is the length of your elements):

prefixes = {}
for word in l:
    for x in range(len(word) - 1):
        prefixes[word[:x]] = True

result = [word for word in l if word not in prefixes]

Iterate over each word and create a dictionary of the first character of each word, then the first two characters, then three, all the way up to all the characters of the word except the last one. Then iterate over the list again and if a word appears in that dictionary it's a shorter subset of some other word in the list

Upvotes: 2

vaku
vaku

Reputation: 712

@Andrew Allen's way

l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
i=0
l = sorted(l)
while True:
 try:
  if l[i] in l[i+1]:
   l.remove(l[i])
   continue
  i += 1
 except:
  break
print(l)
#['abcdef', 'ghijklm', 'pqrs', 'xyz']

Upvotes: 1

milanbalazs
milanbalazs

Reputation: 5329

The following code does what you described.

your_list = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
print("Original list: %s" % your_list)
helper_list = []
for element in your_list:
    for element2 in your_list:
        if element.startswith(element2) and element != element2:
            print("%s starts with %s" % (element, element2))
            print("Remove: %s" % element)
            your_list.remove(element)
print("Removed list: %s" % your_list)

Output:

Original list: ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
abcdef starts with abcd
Remove: abcdef
ghijklm starts with ghi
Remove: ghijklm
Removed list: ['abcd', 'ghi', 'xyz', 'pqrs']

On the other hand, I think there is more simple solution and you can solve it with list comprehension if you want.

Upvotes: 1

Patrick Artner
Patrick Artner

Reputation: 51643

You can use

l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
if "abcdef" in l:  # only 1 check for containment instead of 2
    l = [x for x in l if x != "abcd"]  # to remove _all_ abcd
    # or
    l = l.remove("abcd")               # if you know there is only one abcd in it

This might be slightly faster (if you have far more elements then you show) because you only need to check once for "abcdef" - and then once untile the first/all of list for replacement.

>>> l.remove('abcd') if ('abcdef' in l and 'abcd' in l) else l

checks l twice for its full size to check containment (if unlucky) and then still needs to remove something from it


DISCLAIMER:
If this is NOT proven, measured bottleneck or security critical etc. I would not bother to do it unless I have measurements that suggests this is the biggest timesaver/optimization of all code overall ... with lists up to some dozends/hundreds (tummy feeling - your data does not support any analysis) the estimated gain from it is negligable.

Upvotes: -1

Manoj Kumar Biroj
Manoj Kumar Biroj

Reputation: 323

Try this it will work

l =['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
for i in l:
    for j in l:
        if len(i)>len(j) and j in i:
            l.remove(j)

Upvotes: 0

Nithin
Nithin

Reputation: 1105

l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']

for a in l[:]:
    for b in l[:]:
        if a.startswith(b) and a != b:
            l.remove(b)
print(l)

Output

['abcdef', 'ghijklm', 'xyz', 'pqrs']

Upvotes: 1

Related Questions