Reputation: 53
I'm trying to create a regexp that can match this:
argument ::= define_scope [';' define_scope]*
define_scope ::= (['local'] | 'global') define_var
define_var ::= variable_name expression
variable_name ::= Name
So, something like local varName something;;world foo bar;;local foobar bar
.
I've tried with:
((^|;;)?(local|world) (.+?) (.+?))+
but if I use this on the previous example I obtain these matches:
local varName s
;;world foo b
;;local foobar b
so it take only the first letter of the last word of each match.
If I remove the lazy matching from the last group, it match only:
local varName something;;world foo bar;;local foobar bar
so the last group is something;;world foo bar;;local foobar bar
.
Some ideas to fix this?
Upvotes: 1
Views: 93
Reputation: 53
That is the regexp that I needed:
((?:(local|world) )?(.*?)(?: (.+?))(?:(?<!;);(?!;)|$))+?
This one can parse without problem anything that have ;; without matching it
Thanks anyways to all.
Upvotes: 1
Reputation: 239250
Regular expressions are not the be-all end-all tool in your tool box, and they will not suffice here, but this one can be made to work for your specific limited example by telling it to match up to (but not including) the semi-colons, and removing the non-greedy ?
:
/(^|;;)((local|world) (.+?) ([^;]+))/
Your problem is .
matches any character. Matching .
greedily was eating up the rest of the string on the first match, while non-greedily it was satisfied with the first character. The solution was to tell it to greedily match everything except semi-colons, with [^;]+
. Ideally you should restrict this to the list of characters you actually expect to appear there instead of using .
so freely.
Upvotes: 1
Reputation: 265161
this is not a regular grammar and resulting sentences/words (cs speak) cannot be parsed with a regular expression. it's a context free grammar and you need a parser which utilizes recursive descent (LL-parser).
Upvotes: 2