Reputation: 121
I'm trying to find through a file expressions such as A*B.
A and B could be anything from [A-Z]
[a-z]
[0-9]
and may include <
>
(
)
[
]
_
.
etc. but not commas, semicolon, whitespace, newline or any other arithmetic operator (+ - \ *)
. These are the 8 delimiters. Also there can be spaces between A and * and B. Also the number of opening brackets need to be the same as closing brackets in A and B.
I unsuccessfully tried something like this (not taking into account operators inside A and B):
import re
fp = open("test", "r")
for line in fp:
p = re.compile("( |,|;)(.*)[*](.*)( |,|;|\n)")
m = p.match(line)
if m:
print 'Match found ',m.group()
else:
print 'No match'
Example 1:
(A1 * B1.list(), C * D * E)
should give 3 matches:
An extension to the problem statement could be that, commas, semicolon, whitespace, newline or any other arithmetic operator (+ - \ *) are allowed in A and B if inside backets:
Example 2:
(A * B.max(C * D, E))
should give 2 matches:
I'm new to regular expressions and curious to find a solution to this.
Upvotes: 4
Views: 120
Reputation: 43083
Regular expressions have limits. The border between regular expressions and text parsing can be tight. IMO, using a parser is a more robust solution in your case.
The examples in the question suggest recursive patterns. A parser is again superior than a regex flavor in this area.
Have a look to this proposed solution: Equation parsing in Python.
Upvotes: 1