How to find pattern in list of strings, remove it from the string, and insert it as the next element in the list?

Question

I have a list of strings that looks something like this:

list_strings = ["The", "11:2dog", "is", "2:33", "a22:11", "german", "shepherd.2:2"]

Here is what I want to do:

For each string in the list, I want to remove the numbers that match the pattern number:number. This pattern will always be at the beginning or the end of the string.
When the pattern is removed from the string, I want to insert it as as the next element of the list if it is at the end, or as the previous element of the list if it is at the beginning of the string.

So:

list_strings = ["The", "11:2dog", "is", "2:33", "a22:11", "german", "shepherd.2:2"]

becomes:

new_list_strings = ["The", "11:2", "dog", "is", "2:33", "a", "22:11", "german", "shepherd.", "2:2"]

To find the words that may contain the pattern, I have tried using regular expressions:

for index, word in enumerate(list_strings):
    try:
        if re.search(r'\d+:\d+', word).group() != None:
            words_with_pattern.append([index], word)
    except:
        pass

However, this only finds instances where the pattern is alone like "11:21". Once I have a list of all the words with the pattern, I will have to remove the pattern from the strings, note whether it is at the beginning or at the end, and insert it at the corresponding index in the list.

Any help? Thanks!

ctwheels · Accepted Answer

This method uses re.findall to get all matches in a string and then combines the results into one list.

The regex \d+:\d+|(?:(?!\d+:\d+).)+ works as follows:

Match either of the following
- \d+:\d+ Matches one or more digits, followed by :, then one or more digits
- (?:(?!\d+:\d+).)+ This is a tempered greedy token that matches any character one or more times except where \d+:\d+ matches. This forces it to stop matching at that location and the findall method retries to match that that new location (now matching the \d+:\d+ pattern option instead resulting in multiple matches per string)

Method 1

The following code is much easier to read than Method 2.

See code in use here

import re

ls = ["The", "11:2dog", "is", "2:33", "a22:11", "german", "shepherd.2:2"]
newls = []
for s in ls:
    newls += re.findall(r"\d+:\d+|(?:(?!\d+:\d+).)+", s)
print(newls)

Method 2

This makes the code from Method 1 a one-liner, but it's harder to read. The method used to flatten the list sum(l,[]) is taken from this answer.

See code in use here

import re

ls = ["The", "11:2dog", "is", "2:33", "a22:11", "german", "shepherd.2:2"]
print(sum([re.findall(r"\d+:\d+|(?:(?!\d+:\d+).)+", s) for s in ls], []))

Result

['The', '11:2', 'dog', 'is', '2:33', 'a', '22:11', 'german', 'shepherd.', '2:2']

How to find pattern in list of strings, remove it from the string, and insert it as the next element in the list?

Answers (2)

Method 1

Method 2

Result

Related Questions