Reputation: 931
I'm looking for the most efficient way to reduce a given list based off of substrings already in the list.
For example
mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']
would be reduced to:
mylist = ['abcd','qrs']
because both 'abcd' and 'qrs' are the smallest substring of other elements in that list. I was able to do this with about 30 lines of code, but I suspect there is a crafty one-liner out there..
Upvotes: 4
Views: 472
Reputation: 40836
One solution is to iterate over all the strings and split them based on if they had different characters, and recursively apply that function.
def reduce_substrings(strings):
return list(_reduce_substrings(map(iter, strings)))
def _reduce_substrings(strings):
# A dictionary of characters to a list of strings that begin with that character
nexts = {}
for string in strings:
try:
nexts.setdefault(next(string), []).append(string)
except StopIteration:
# Reached the end of this string. It is the only shortest substring.
yield ''
return
for next_char, next_strings in nexts.items():
for next_substrings in _reduce_substrings(next_strings):
yield next_char + next_substrings
This splits it into a dictionary based on the character, and tries to find the shortest substring out of those that it split into a different list in the dictionary.
Of course, because of the recursive nature of this function, a one-liner wouldn't be possible as efficiently.
Upvotes: 0
Reputation: 12927
Probably not the most efficient, but at least short:
mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']
outlist = []
for l in mylist:
if any(o.startswith(l) for o in outlist):
# l is a prefix of some elements in outlist, so it replaces them
outlist = [ o for o in outlist if not o.startswith(l) ] + [ l ]
if not any(l.startswith(o) for o in outlist):
# l has no prefix in outlist yet, so it becomes a prefix candidate
outlist.append(l)
print(outlist)
Upvotes: 0
Reputation: 10971
this seems to be working (but not so efficient i suppose)
def reduce_prefixes(strings):
sorted_strings = sorted(strings)
return [element
for index, element in enumerate(sorted_strings)
if all(not previous.startswith(element) and
not element.startswith(previous)
for previous in sorted_strings[:index])]
tests:
>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
'qrs', 'qrst', 'qrstu'])
['abcd', 'qrs']
>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
'qrs', 'qrst', 'qrstu',
'gabcd', 'gab', 'ab'])
['ab', 'gab', 'qrs']
Upvotes: 3
Reputation: 1342
Try this one:
import re
mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']
new_list=[]
for i in mylist:
if re.match("^abcd$",i):
new_list.append(i)
elif re.match("^qrs$",i):
new_list.append(i)
print(new_list)
#['abcd', 'qrs']
Upvotes: -1