Chris Hall
Chris Hall

Reputation: 931

Reduce list based off of element substrings

I'm looking for the most efficient way to reduce a given list based off of substrings already in the list.

For example

mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']

would be reduced to:

mylist = ['abcd','qrs']

because both 'abcd' and 'qrs' are the smallest substring of other elements in that list. I was able to do this with about 30 lines of code, but I suspect there is a crafty one-liner out there..

Upvotes: 4

Views: 472

Answers (4)

Artyer
Artyer

Reputation: 40836

One solution is to iterate over all the strings and split them based on if they had different characters, and recursively apply that function.

def reduce_substrings(strings):
    return list(_reduce_substrings(map(iter, strings)))

def _reduce_substrings(strings):
    # A dictionary of characters to a list of strings that begin with that character
    nexts = {}
    for string in strings:
        try:
            nexts.setdefault(next(string), []).append(string)
        except StopIteration:
            # Reached the end of this string. It is the only shortest substring.
            yield ''
            return
    for next_char, next_strings in nexts.items():
        for next_substrings in _reduce_substrings(next_strings):
            yield next_char + next_substrings

This splits it into a dictionary based on the character, and tries to find the shortest substring out of those that it split into a different list in the dictionary.

Of course, because of the recursive nature of this function, a one-liner wouldn't be possible as efficiently.

Upvotes: 0

Błotosmętek
Błotosmętek

Reputation: 12927

Probably not the most efficient, but at least short:

mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']

outlist = []
for l in mylist:
    if any(o.startswith(l) for o in outlist):
        # l is a prefix of some elements in outlist, so it replaces them
        outlist = [ o for o in outlist if not o.startswith(l) ] + [ l ]
    if not any(l.startswith(o) for o in outlist):
        # l has no prefix in outlist yet, so it becomes a prefix candidate
        outlist.append(l)

print(outlist)

Upvotes: 0

Azat Ibrakov
Azat Ibrakov

Reputation: 10971

this seems to be working (but not so efficient i suppose)

def reduce_prefixes(strings):
    sorted_strings = sorted(strings)
    return [element
            for index, element in enumerate(sorted_strings)
            if all(not previous.startswith(element) and
                   not element.startswith(previous)
                   for previous in sorted_strings[:index])]

tests:

>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
                    'qrs', 'qrst', 'qrstu'])
['abcd', 'qrs']
>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
                    'qrs', 'qrst', 'qrstu',
                    'gabcd', 'gab', 'ab'])
['ab', 'gab', 'qrs']

Upvotes: 3

dildeolupbiten
dildeolupbiten

Reputation: 1342

Try this one:

import re
mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']
new_list=[]
for i in mylist:
    if re.match("^abcd$",i):
        new_list.append(i)
    elif re.match("^qrs$",i):
        new_list.append(i)
print(new_list)
#['abcd', 'qrs']

Upvotes: -1

Related Questions