Reputation: 4831
Consider following script (it's total nonsense in pseudo-language):
if (Request.hostMatch("asfasfasf.com") && someString.existsIn(new String[] {"brr", "hrr"})) {
if (Requqest.clientIp("10.0.x.x")) {
somevar = "1";
}
somevar = "2";
}
else {
somevar = "first";
}
string foo = "foo";
// etc. etc.
How would you grab if-block's parameters and contents from it? The if-block has format of:
if<whitespace>(<parameters>)<whitespace>{<contents>}<anything>
I tried using String.split()
with regex pattern of ^if\s*\(|\)\s*\{|\}\s*
but this fails miserably. Namely, the problem is that ) {
is found also in inner if-block and the closing }
is found from many places as well. I don't think neither lazy or eager expansion works here.
So... any pointers to what might I need here in order to implement this with regex?
I also need to get the remaining string without the if-block's code (so code starting from else { ...
). Using just String.split()
seems to make it difficult as there is no information about the length of the parts that were parsed away.
I initially created a loop based solution (using String.substring()
heavily) for this, but it's dull. I would like to have something fancier instead. Should I go with regex or create a custom, generic function (there are many other cases than just this) that takes the parseable String and the pattern instead (consider the if<whitespace>(...
pattern above)?
Edit: Changed returns to variable assignments as it would have not made sense otherwise.
Upvotes: 1
Views: 255
Reputation: 12299
As per the above, you'll need a parser. One type that's easy to implement (and fun to write!) is a recursive descent parser with backtracking. There is also a plethora of parser generators out there, though most of those have a learning curve. One Java-friendly parser generator is JavaCC.
Upvotes: 1
Reputation: 168755
You'd be far better off using (or writing) a parser than trying to do this with Regex.
Regex is great for somethings, but for complex parsing like this, it sucks. Another example where it sucks that gets asked a lot here is parsing HTML - you can do it to a limited degree, but for anything complex, a DOM parser is a much better solution.
For a [very] simple parser, what you need is a recursive function that searches for a braces {
and }
, recursing down a level each time it comes across an opening brace, and returning back up a level when it finds a closing brace. It then needs to store the string contents between the two braces at each level.
Upvotes: 2
Reputation: 11814
A regular language won't work because a regular grammar can't match things like "any number of open parenthesis followed by any number of close parenthesis". A context-free grammar would be needed for that.
Unless you use a context-free grammar parser for Java or a regular expression extension that makes regular expressions no longer regular, your loop-based solution is probably the fanciest solution.
Upvotes: 1