tadeufontes
tadeufontes

Reputation: 477

How to slice a string where start and end are defined by two different substrings?

So I have a list of strings such as this:

list_strings=["YYYYATGMBMSSBAHHH","CCCCUINDAKSLLL","HHHHKJSHAKJJKKKK","ERRREEJK","XZXZOOOOYYYFFFFAKSXXX","RRRRRKJUUUUNNNNNGYRRRRRR","HHHHSDAFF"]

And I have two lists of patterns to be searched for in each string:

forward_patterns=['ATG', 'KJ', 'OOOO','UI']
reverse_patterns=['GY', 'AKS','BA','JK']

I want, for each string in list_strings, to be sliced from the position of one forward_patterns pattern until the position of one reverse_patterns pattern (both, start and end patterns should be removed as well). The string should be sliced only once for each list of patterns, considering only the first occurence that is found. It's irrelevant which of the patterns inside either of the lists of patterns is found and used for the slicing

My output in this case would be this:

list_strings=["MBMSS","ND","SHATGKJ","untrimmed","YYYFFFF","UUUUNNNNN","untrimmed"]

I have tried with these for loops but unfortunately it isn't trimming any of them:

for i in range(len(list_strings)):
    for pf in forward_patterns:
        beg=list_strings[i].find(pf)
        for pr in reverse_patterns:
            end=list_strings[i].rfind(pr)
            if(beg !=-1 and end !=-1):
                list_strings[i]=list_strings[i][beg+len(pf):end]
            else:
                list_strings[i]="untrimmed"

Basically I'm getting a list of all "untrimmed" but I don't know why:

list_strings=["untrimmed","untrimmed","untrimmed","untrimmed","untrimmed","untrimmed","untrimmed"]

What could be wrong with my code? Thanks in advance for any answer!

Upvotes: 0

Views: 168

Answers (2)

Ignacio Alorre
Ignacio Alorre

Reputation: 7605

Based on the last update:

res_list = []

for s in list_strings:
    upatedString = s
    for f in forward_patterns:
        if f in upatedString:
            upatedString = upatedString[upatedString.index(f)+len(f):]
            break
    for r in reverse_patterns:
        if r in upatedString:
            upatedString = upatedString[:upatedString.index(r)]
            break
    
    if len(upatedString) == len(s):
        res_list.append("Untrimmed")
    else:
        res_list.append(upatedString)

res_list

Upvotes: 3

user14984131
user14984131

Reputation:

Your example is a bit confusing. I dont get why 'ERRREEJK' is untrimmed, even if 'JK' is in the reverse pattern oO. Maybe this is what you are looking for?

list_strings=["YYYYATGMBMSSBAHHH","CCCCUINDAKSLLL","HHHHKJSHAKJJKKKK","ERRREEJK","XZXZOOOOYYYFFFFAKSXXX","RRRRRKJUUUUNNNNNGYRRRRRR","HHHHSDAFF"]
forward_patterns=['ATG', 'KJ', 'OOOO','UI']
reverse_patterns=['GY', 'AKS','BA','JK']
 
new_strings = []
for string in list_strings:
    for pattern in forward_patterns:
        _temp = string.split(pattern,1)
        if len(_temp) == 2:
            _temp = _temp[1]
            break
        else:
            _temp = _temp[0]
    for pattern in reverse_patterns:
        _temp = _temp.rsplit(pattern,1)[0]
        if len(_temp) == 2:
            break
    if string == _temp:
        new_strings.append('untrimmed')
    else:
        new_strings.append(_temp)

Upvotes: 0

Related Questions