Vitu Tomaz
Vitu Tomaz

Reputation: 61

Python's re library's sub method "eating" capturing groups

Introductory gibberish:

My current project is sort of a Lisp Parser, and RegEx is a true wonder, though it's giving me a little headache in this particular function:

What the function should do:

Receive a string containing an equation and return it formatted so that the parser can actually read it (for the moment, it means inserting multiplication marks between variables, braces, and numbers

What it actually does:

The function succeeds in finding the spots to replace, but, somewhere in the assembly of the returned string, it seems to lose the original matching pattern in the \1 slot and squeeze an ~unidentified char~ character in there (the square, represented by [], since I couldn't manage to paste it here).

Any insights on why this happens?


Code:

import re

def eqxFormat(eq):
    vars = "x"
    for i in vars:
        eq = re.sub(r'%s([0-9\(])' % i, '%s*\1' %i, eq)
        eq = re.sub(r'([0-9\)])%s' % i, '\1*%s' %i, eq)

    eq = re.sub(r'([0-9])\(', r'\1*(', eq)
    eq = re.sub(r'\)([0-9])', r')*\1', eq)
    return eq

eq = "3(x+2(5-x))^3+2x^2+x(x^-1*exp(x))"

print(eqxFormat(eq))

Output:

3*(x+2*(5-x))^3+[]*x^2+x*[]x^-1*exp(x))

Upvotes: 1

Views: 51

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

You need to use raw string r when you reference the capture group r'%s*\1' and r'\1*%s':

  eq = re.sub(r'%s([0-9\(])' % i, r'%s*\1' %i, eq)
  eq = re.sub(r'([0-9\)])%s' % i, r'\1*%s' %i, eq)

Once you add the r your code will output the correct string:

In [6]: eq = "3(x+2(5-x))^3+2x^2+x(x^-1*exp(x))"

In [7]: eqxFormat(eq)
Out[7]: '3*(x+2*(5-x))^3+2*x^2+x*(x^-1*exp(x))'

You could also escape the backslash with another i.e '%s*\\1', if you don't use a raw string or escape you get Ctrl-A:

In [8]: "\1"
Out[8]: '\x01'
In [1]: r"\1"
Out[1]: '\\1'

Upvotes: 3

Related Questions