Reputation: 8700
I need to parse out writeln("test");
from a string.
I was using (?<type>writeln)\((?<args>[^\)]*)\);
as the regex, but this isn't perfect, if you try and parse writeln("heloo :)");
or something similar, the regex won't parse it (because of the ')' in the quotes). Is there a way to register that since the ')' is in the quote marks, the regex should ignore it, and look for the next ')'?
Thanks,
Max
Upvotes: 1
Views: 103
Reputation: 3265
The following will match patterns like writeln("hello :) \"world\"!");
string regex = "(?<type>writeln)\\(\"(?<args>(\\\\\"|[^\"])*)\"\\);";
I'm assuming this is only for single arguments.
Upvotes: 1
Reputation: 70344
Why not write a little parser for this? Just loop through the characters and have a simple state machine for parsing.
This kind of problem is hard to do in regular expressions since the problem (grammar) is not regular. Look up on parsing HTML with regex in SO ;)
BUT: If you control your input to a certain extent, then you might just be able to get away with regexes. See other answers here for "good enough" ways to do it.
This basically boils down to:
I do this all the time. And I hate myself for it!
Upvotes: 2
Reputation: 625347
You've encountered the sort of problem you get using regexes to parse non-regular languages.
That being said, try:
(?<type>writeln)\((?<args>("[^"]*"|))\);
It's not perfect but nothing will be.
Upvotes: 1