Fabs
Fabs

Reputation: 149

Substitution of certain occurences of a string with another string in python

Sorry if someone already posted the same question, but I was unable to find it.

I am trying to replace certain occurrences of a string pattern with something else. The problem I do not want to replace all occurrences, just all apart from one.

For example. Imagine I have the string: '(M:2,Seq0:2):10,Seq1:20,(Seq2:40,Seq3:40)' The pattern I want to find is: '\w+\d+:\d' (which refer to Seq[number])

Imagine I want to change all numbers after 'Seq[number]:' but not the one following for example, 'Seq1:'

Imagine that to all these numbers after Seq[number]: I wanna sum the value of 10

in The end I would like to have the string:

'(M:2,Seq0:12):10,Seq1:20,(Seq2:50,Seq3:50)'

Is there a way of doing this in a loop? I tried to use re.findall, but it returns all occurences in a text. How could I incorporate this in a loop?

Thanks!

Upvotes: 0

Views: 65

Answers (1)

Andrew Clark
Andrew Clark

Reputation: 208425

You can do this using re.sub with a function as the replacement, for example:

>>> import re
>>> s = '(M:2,Seq0:2):10,Seq1:20,(Seq2:40,Seq3:40)'
>>> def repl(match):
...     return match.group(1) + str(int(match.group(2)) + 10)
...
>>> re.sub(r'(\w+(?!1:)\d+:)(\d+)', repl, s)
'(M:2,Seq0:12):10,Seq1:20,(Seq2:50,Seq3:50)'

The restriction to not match Seq1: is handled by the negative lookahead (?!1:), the capturing groups are used just to separate the portion of the string that you want to modify from the rest of it. The replacement function then returns group 1 unchanged plus the value from group 2 plus 10.

As suggested by Cilyan in comments, you could also add the restriction to not replace for Seq1: in the replacement function, which simplifies the regex. Here is how this would look:

def repl(match):
    if match.group(1) == 'Seq1:':
        return match.group(0)
    return match.group(1) + str(int(match.group(2)) + 10)

result = re.sub(r'(\w+\d+:)(\d+)', repl, s)

edit: To address the questions in your comment, here is how you could write this to modify the number that you add and which prefix (like Seq1:) should be ignored:

def make_repl(n, ignore):
    def repl(match):
        if match.group(1) == ignore:
            return match.group(0)
        return match.group(1) + str(int(match.group(2)) + n)
    return repl

result = re.sub(r'(\w+\d+:)(\d+)', make_repl(10, 'Seq1:'), s)

Upvotes: 2

Related Questions