Reputation: 2797
I have two questions.
How to make optional character greedy? I'm trying to write custom
parser and want that function arguments were in parentheses. For
example sin x
becomes sin(x)
and cosh^2 x
becomes cosh^2(x)
. My
regex:
input = 'sinh x'
output=re.sub(r'(sin|cos|tan|cot|sec|csc)(h?)\s*(|\^\s*[\(]?\s*\-?\s*\d+\s*[\)]?\s*)?([a-z0-9]+)',r'\1\2\3(\4)', input)
This works fine. But when I input sinh(x)
(already good-formed expression), it outputs sin(h)(x)
. I need to
make (h?) greedy or fail if there is no match in \4. How to do that? Note, that I can't write ([a-gi-z0-9])
,because sinh(h)
is valid.
Is there any difference between (h?)
and ([h]?)
?
Upvotes: 7
Views: 6668
Reputation: 23
This seems a pretty robust way to parse the input into (function) (possible ^2) (parameter)
(sinh?|cosh?|tan|cot|sec|csc)[ (]*([\^a-z0-9]*?) *([a-z0-9]+)\)?$
simpler & more concise that using look-ahead methods perhaps.
Upvotes: 0
Reputation: 780909
Optional characters are already greedy (you would use ??
to make it non-greedy). But greediness just means that it will try to find the longest match that still allows the rest of the regular expression to match. It will still backtrack if necessary. If you want to force failure if there's something following it, one way to do that is with a negative lookahead. I'm posting this for the value of the explanation above. Here's a regexp that uses this:
(sin|cos|tan|cot|sec|csc)(?!.\([^)]*\))(h?)\s*(|\^\s*[\(]?\s*\-?\s*\d+\s*[\)]?\s*)?([a-z0-9]+)
[*]?
may be easier to read than \*?
.Upvotes: 2
Reputation: 726559
Rather than making the optional h
greedy, consider disambibuating your grammar by requiring that the letter inside parentheses be prefixed by a space or an opening parenthesis:
((?<=\s|\()[a-z0-9]+)
// ^^^^^^^^^^^^
This lookbehind ensures that 'h'
(or any other letter, for that matter) that follows the name of the function without spaces is not treated as a function parameter.
I would change the overall expression as follows:
((?:sin|cos|tan|cot|sec|csc)(?:h)?)\s*(?:[\^](\d+)\s*)?(?:((?<=\s|\()[a-z0-9]+)|[(]((?<=\s|\()[a-z0-9]+)[)])
to add an optional digit after ^
, and to make sure that the parentheses are matched (i.e. both parentheses are present, or both parentheses are missing).
Demo (using Java regex engine).
Upvotes: 1