Reputation: 123
I am trying to parse the linear equation using REGEX. Equation looks as follow:
2 * var1.val + 7 * var2 + 9 * var3 = 1
3 * var1.val + 4 * var2 = 9
param1.val * var1.val + 4 * var3 = 7
The coefficients can be numeric or parameters. I want to get result as:
[2, 7, 9
3, 4, 0
param1.val, 0, 4]
I googled and found a sample code which I modified to meet my need. It looks like:
equations = [' 2 * var1.name + 7 * var2 + 9 * var3 = 1',
' 3 * var1.name + 4 * var2 = 9',
' param1.val * var1.name + 4 * var3 = 7']
augmented_matrix = {'__b__':[0]*3} # initialize the RHS vector
parse_ptrn = r'([+-]?[\d*|\w*][\*]+)(\w+\.?\w+)'
parse_obj = re.compile(parse_ptrn)
for i in range(3):
e = ''.join(equations[i].split()) # split and join to remove spaces
left, right = e.split('=') # separate RHS and LHS
try:
augmented_matrix['__b__'][i] = float(right) # if possible convert RHS to float
except:
augmented_matrix['__b__'][i] = right
# FOR LHS
for coeff, var in parse_obj.findall(left):
if coeff == '': coeff = 1
elif coeff == '-': coeff = -1
else:
try:
coeff = float(coeff.replace("*","")) # convert to float/Remove * from coeff
except:
coeff = coeff.replace("*","")
if var not in augmented_matrix:
augmented_matrix[var] = [0] * 3
augmented_matrix[var][i] = coeff
print left, right
print parse_obj.findall(left)
It is not able correctly parse the third equation because of the parameters. For first coefficient in third equation, it gives me last letter "l" instead of "param.val". I believe REGEX ([+-]?[\d*|\w*][\*]+) should be able to find anything between start of the string and * (either digit or characters).
Please help me.
Upvotes: 2
Views: 1282
Reputation:
As a side note, if you know the order of the equations (like 3), it is
possible to do it all in a single regex.
# ^(?=.*?\S+\s*\*\s*var[123]).*?(?:(\S+)\s*\*\s*var1.+?)?(?:(\S+)\s*\*\s*var2.+?)?(?:(\S+)\s*\*\s*var3.+?)?$
^
(?=
.*? \S+
\s* \* \s* var [123]
)
.*?
(?:
( \S+ ) # (1)
\s* \* \s* var1
.+?
)?
(?:
( \S+ ) # (2)
\s* \* \s* var2
.+?
)?
(?:
( \S+ ) # (3)
\s* \* \s* var3
.+?
)?
$
Output:
** Grp 0 - ( pos 0 , len 47 )
2 * var1.val + 7 * var2 + 9 * var3 = 1
** Grp 1 - ( pos 0 , len 1 )
2
** Grp 2 - ( pos 24 , len 1 )
7
** Grp 3 - ( pos 35 , len 1 )
9
-------------
** Grp 0 - ( pos 49 , len 47 )
3 * var1.val + 4 * var2 = 9
** Grp 1 - ( pos 49 , len 1 )
3
** Grp 2 - ( pos 73 , len 1 )
4
** Grp 3 - NULL
-------------
** Grp 0 - ( pos 98 , len 47 )
param1.val * var1.val + 4 * var3 = 7
** Grp 1 - ( pos 98 , len 10 )
param1.val
** Grp 2 - NULL
** Grp 3 - ( pos 133 , len 1 )
4
Upvotes: 0
Reputation: 2074
Try using this for your regular expression instead:
parse_ptrn = r'([+-]?[\w.]*\*)(\w+\.?\w+)'
I changed [\d*|\w*]
(i.e., ONE character which is a digit \d
, asterisk *
, pipe |
or word character \w
) to [\w.]+
(i.e., AT LEAST ONE word character or decimal point). Note that \d
is not necessary because it is a subset of \w
(all digits are word characters). Also, your original code would not have worked for multi-digit coefficients, like 10
, because it was only selecting ONE character before the *
.
Please note that this will still not work for equations like var1.val + 4 * var2 = 9
due to the lack of coefficient and *
in front of the first variable, var1.val
. I will leave this as an exercise for you, but if you have trouble with it, just comment on this answer and I will update it to include that case as well (I assume you would want a coefficient of 1
in that case)?
Upvotes: 1