Philip John
Philip John

Reputation: 5565

Regex issue to parse a string

I was trying to generate a regex to be used in Java using this link.

I can have the following kind of strings.

1. customer calls <function_name> using <verb> on <uri> with <object>
2. customer calls <function_name> using 'POST' on <uri> with <object>
3. customer calls 'create' using 'POST' on <uri> with <object>
4. customer calls 'create' using 'POST' on <uri>

As you can see, the last portion after with is optional in my case.

I implemented the following regular expression.

.+call[s]?.+(\'\w+\'|<\w+>).+using.+(\'\w+\'|<\w+>).+on.+(\'\w+\'|<\w+>).*(with.+(\'\w+\'|<\w+>))?

But when I give string 3, I am getting the output as 'create','POST',<object>, null, null instead of 'create','POST',<uri>, <object>. When I give string 4, the output is 'create','POST',<uri>, null, null instead of 'create','POST',<uri>.

The regex without (with.+(\'\w+\'|<\w+>))? works properly for string 4. How can I change this last part where I need to make the section from with optional?

Upvotes: 3

Views: 77

Answers (2)

Lucas Trzesniewski
Lucas Trzesniewski

Reputation: 51330

Your regex accepts too much and backtracks a lot due to your overuse of the greedy .+. Remember that every time you write .+ or .*, the regex engine matches everything up to the end of the line and then needs to backtrack. This is both expensive and error prone - it eats up too much text nearly every time, and you should be very careful when using this construct. It doesn't act like most people expect it to.

The simple solution in your case is to actually state precisely what you're expecting, and from your example text it looks like you need whitespace, so just use \s+ instead. Your regex becomes:

.+?\bcalls?\s+(\'\w+\'|<\w+>)\s+using\s+(\'\w+\'|<\w+>)\s+on\s+(\'\w+\'|<\w+>)(?:\s+with\s+(\'\w+\'|<\w+>))?

Demo

Note that I also changed the first .+ to a lazy .+? (even though you could probably just remove it from the pattern unless you also need the full line to be captured) followed by a word boundary anchor \b. I also changed a group to be noncapturing, since you most probably don't need to capture that.

Upvotes: 1

Mahendra
Mahendra

Reputation: 1426

Use [ ]+ in place of .+ for space

Try this:

.+call(?:s)?.+(\'\w+\'|<\w+>)[ ]*using.+(\'\w+\'|<\w+>)[ ]*on[ ]*(\'\w+\'|<\w+>)[ ]*(?:with)?[ ]*(\'\w+\'|<\w+>)?

You will get

 1. <function_name> <verb> <uri> <object>    
 2. 'create' 'POST' <uri> <object>    
 3. <function_name> 'POST' <uri> <object>    
 4. 'create' 'POST' <uri> null

in 4th row last one is null because end token (i.e. <object>) is missing

Upvotes: 1

Related Questions