Reputation: 303
So starting with a list of strings, as below
string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']
I want to remove any element from the list that is a substring of another element, giving the result for instance...
string_list = ['resting', 'looked', 'spit']
I have some code that acheives this but it's embarrassingly ugly and probably needlessly complex. Is there a simple way to do this in Python?
Upvotes: 17
Views: 18810
Reputation: 1480
Here's an un-optimal way, only use if the lists are small:
for str1 in string_list:
for str2 in string_list:
if str1 in str2 and str1 != str2:
string_list.remove(str1)
Upvotes: 0
Reputation: 2230
Here's another way to do it. Assuming you have a sorted list to start with and you don't have to do the sieving inplace, we can just choose the longest strings in one pass:
string_list = sorted(string_list)
sieved = []
for i in range(len(string_list) - 1):
if string_list[i] not in string_list[i+1]:
sieved.append(string_list[i])
Upvotes: -1
Reputation: 8051
First building block: substring.
You can use in
to check:
>>> 'rest' in 'resting'
True
>>> 'sing' in 'resting'
False
Next, we're going to choose the naive method of creating a new list. We'll add items one by one into the new list, checking if they are a substring or not.
def substringSieve(string_list):
out = []
for s in string_list:
if not any([s in r for r in string_list if s != r]):
out.append(s)
return out
You can speed it up by sorting to reduce the number of comparisons (after all, a longer string can never be a substring of a shorter/equal length string):
def substringSieve(string_list):
string_list.sort(key=lambda s: len(s), reverse=True)
out = []
for s in string_list:
if not any([s in o for o in out]):
out.append(s)
return out
Upvotes: 14
Reputation: 394775
Here's a possible solution:
string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']
def string_set(string_list):
return set(i for i in string_list
if not any(i in s for s in string_list if i != s))
print(string_set(string_list))
prints out:
set(['looked', 'resting', 'spit'])
Note I create a set (using a generator expression) to remove possibly duplicated words as it appears that order does not matter.
Upvotes: 3
Reputation: 3121
Here's is the efficient way of doing it (relative to the above solutions ;) ) as this approach reduces the number of comparisons between the list elements a lot. If I have a huge list, I'd definitely go with this and of course you can morph this solution into a lambda function to make it look small:
string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']
for item in string_list:
for item1 in string_list:
if item in item1 and item!= item1:
string_list.remove(item)
print string_list
Output:
>>>['resting', 'looked', 'spit']
Hope it helps !
Upvotes: 0
Reputation: 2216
Another one liner:
[string for string in string_list if len(filter(lambda x: string in x,string_list)) == 1]
should be fairly readable, just not that pythonic.
Upvotes: 2
Reputation: 2382
Here is a one-liner that does what you want:
filter(lambda x: [x for i in string_list if x in i and x != i] == [], string_list)
Example:
>>> string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']
>>> filter(lambda x: [x for i in string_list if x in i and x != i] == [], string_list)
['resting', 'looked', 'spit']
Upvotes: 0
Reputation: 63978
Here's one method:
def find_unique(original):
output = []
for a in original:
for b in original:
if a == b:
continue # So we don't compare a string against itself
elif a in b:
break
else:
output.append(a) # Executed only if "break" is never hit
return output
if __name__ == '__main__':
original = ['rest', 'resting', 'look', 'looked', 'it', 'split']
print find_unique(original)
It exploits the fact that we can easily check if one string is a substring of another by using the in
operator. It essentially goes through each string, checks to see if it's a substring of another, and appends itself to an output list if it isn't.
This prints out ['resting', 'looked', 'split']
Upvotes: 0