chungkim271
chungkim271

Reputation: 967

python's regular expression that repeats

I have a list of lines. I'm writing a typical text modifying function that runs through each line in the list and modifies it when a pattern is detected.

I realized later in writing this type of functions that a pattern may repeat multiple times in the line.

For example, this is one of the functions I wrote:

def change_eq(string):
    #inputs a string and outputs the modified string
    #replaces (X####=#) to (X####==#)

    #set pattern
    pat_eq=r"""(.*)              #stuff before
            ([\(\|][A-Z]+[0-9]*) #either ( or | followed by the variable name 
            (=)                  #single equal sign we want to change 
            ([0-9]*[\)\|])       #numeric value of the variable followed by ) or |
            (.*)"""              #stuff after

    p= re.compile(pat_eq, re.X)
    p1=p.match(string)

    if bool(p1)==1: 
        # if pattern in pat_eq is detected, replace that portion of the string with a modified version
        original=p1.group(0)
        fixed=p1.group(1)+p1.group(2)+"=="+p1.group(4)+p1.group(5)
        string_c=string.replace(original,fixed)
        return string_c
    else: 
        # returns the original string
        return string

But for an input string such as

'IF (X2727!=78|FLAG781=0) THEN PURPILN2=(X2727!=78|FLAG781=0)*X2727' 

, group() only works on the last pattern detected in the string, so it changes it to

'IF (X2727!=78|FLAG781=0) THEN PURPILN2=(X2727!=78|FLAG781==0)*X2727' 

, ignoring the first case detected. I understand that's the product of my function using the group attribute.

How would I address this issue? I know there is {m,n}, but does it work with match?

Thank you in advance.

Upvotes: 0

Views: 56

Answers (1)

R Phillip Castagna
R Phillip Castagna

Reputation: 896

Different languages handle "global" matches in different ways. You'll want to use Python's re.finditer (link) and use a for loop to iterate through the resulting match objects.

Example with some of your code:

p = re.compile(pat_eq, re.X)
string_c = string
for match_obj in p.finditer(string):
    original = match_obj.group(0)
    fixed = p1.group(1) + p1.group(2) + '==' + p1.group(4) + p1.group(5)
    string_c = string_c.replace(original, fixed)

return string_c

Upvotes: 1

Related Questions