Steve81
Steve81

Reputation: 53

Problem with a PCRE regexp

I'm trying to create a regexp that can match this:

argument ::= define_scope [';' define_scope]*
define_scope ::= (['local'] | 'global') define_var
define_var ::= variable_name expression
variable_name ::= Name

So, something like local varName something;;world foo bar;;local foobar bar.
I've tried with:

((^|;;)?(local|world) (.+?) (.+?))+

but if I use this on the previous example I obtain these matches:

local varName s
;;world foo b
;;local foobar b

so it take only the first letter of the last word of each match.
If I remove the lazy matching from the last group, it match only:

local varName something;;world foo bar;;local foobar bar

so the last group is something;;world foo bar;;local foobar bar.

Some ideas to fix this?

Upvotes: 1

Views: 93

Answers (3)

Steve81
Steve81

Reputation: 53

That is the regexp that I needed:

((?:(local|world) )?(.*?)(?: (.+?))(?:(?<!;);(?!;)|$))+?

This one can parse without problem anything that have ;; without matching it

Thanks anyways to all.

Upvotes: 1

user229044
user229044

Reputation: 239250

Regular expressions are not the be-all end-all tool in your tool box, and they will not suffice here, but this one can be made to work for your specific limited example by telling it to match up to (but not including) the semi-colons, and removing the non-greedy ?:

/(^|;;)((local|world) (.+?) ([^;]+))/

Your problem is . matches any character. Matching . greedily was eating up the rest of the string on the first match, while non-greedily it was satisfied with the first character. The solution was to tell it to greedily match everything except semi-colons, with [^;]+. Ideally you should restrict this to the list of characters you actually expect to appear there instead of using . so freely.

Upvotes: 1

knittl
knittl

Reputation: 265161

this is not a regular grammar and resulting sentences/words (cs speak) cannot be parsed with a regular expression. it's a context free grammar and you need a parser which utilizes recursive descent (LL-parser).

Upvotes: 2

Related Questions