Reputation: 5565
I was trying to generate a regex to be used in Java using this link.
I can have the following kind of strings.
1. customer calls <function_name> using <verb> on <uri> with <object>
2. customer calls <function_name> using 'POST' on <uri> with <object>
3. customer calls 'create' using 'POST' on <uri> with <object>
4. customer calls 'create' using 'POST' on <uri>
As you can see, the last portion after with
is optional in my case.
I implemented the following regular expression.
.+call[s]?.+(\'\w+\'|<\w+>).+using.+(\'\w+\'|<\w+>).+on.+(\'\w+\'|<\w+>).*(with.+(\'\w+\'|<\w+>))?
But when I give string 3, I am getting the output as 'create','POST',<object>, null, null
instead of 'create','POST',<uri>, <object>
.
When I give string 4, the output is 'create','POST',<uri>, null, null
instead of 'create','POST',<uri>
.
The regex without (with.+(\'\w+\'|<\w+>))?
works properly for string 4.
How can I change this last part where I need to make the section from with
optional?
Upvotes: 3
Views: 77
Reputation: 51330
Your regex accepts too much and backtracks a lot due to your overuse of the greedy .+
. Remember that every time you write .+
or .*
, the regex engine matches everything up to the end of the line and then needs to backtrack. This is both expensive and error prone - it eats up too much text nearly every time, and you should be very careful when using this construct. It doesn't act like most people expect it to.
The simple solution in your case is to actually state precisely what you're expecting, and from your example text it looks like you need whitespace, so just use \s+
instead. Your regex becomes:
.+?\bcalls?\s+(\'\w+\'|<\w+>)\s+using\s+(\'\w+\'|<\w+>)\s+on\s+(\'\w+\'|<\w+>)(?:\s+with\s+(\'\w+\'|<\w+>))?
Note that I also changed the first .+
to a lazy .+?
(even though you could probably just remove it from the pattern unless you also need the full line to be captured) followed by a word boundary anchor \b
. I also changed a group to be noncapturing, since you most probably don't need to capture that.
Upvotes: 1
Reputation: 1426
Use [ ]+
in place of .+
for space
Try this:
.+call(?:s)?.+(\'\w+\'|<\w+>)[ ]*using.+(\'\w+\'|<\w+>)[ ]*on[ ]*(\'\w+\'|<\w+>)[ ]*(?:with)?[ ]*(\'\w+\'|<\w+>)?
You will get
1. <function_name> <verb> <uri> <object>
2. 'create' 'POST' <uri> <object>
3. <function_name> 'POST' <uri> <object>
4. 'create' 'POST' <uri> null
in 4th row last one is null
because end token (i.e. <object>
) is missing
Upvotes: 1