mohsyn
mohsyn

Reputation: 298

Regex function to find specific depth in recursive

I have the following scenario where I am supposed to use regex (Java/PCRE) on a line of code and strip off certain defined function and only strong the value of that function like in example below:

Input ArrayNew(1) = adjustalpha(shadowcolor, CInt(Math.Truncate (ObjectToNumber (Me.bezierviewshadow.getTag))))

Output : Replace Regex

ArrayNew(1) = adjustalpha(shadowcolor, Me.bezierviewshadow.getTag)

Here CInt, Math.Truncate, and ObjectToNumber is removed retaining on output as shown above

The functions CInt, Math.Truncate keep on changing to CStr or Math.Random etc etc so regex query can not be hardcoded.

I tried a lot of options on stackoverflow but most did not work.

Also it would be nice if the query is customizable like Cint returns everything function CInt refers to. ( find a text then everything between first ( and ) ignoring balanced parenthesis pairs in between.

Upvotes: 1

Views: 238

Answers (1)

scriber36
scriber36

Reputation: 135

I know it's not pretty, but it's your fault to use raw regex for this :)

@Test
void unwrapCIntCall() {
    String input = "ArrayNew(1) = adjustalpha(shadowcolor, CInt(Math.Truncate (ObjectToNumber (Me.bezierviewshadow.getTag))))";
    String expectedOutput = "ArrayNew(1) = adjustalpha(shadowcolor, Me.bezierviewshadow.getTag)";

    String output = input.replaceAll("CInt\\s*\\(\\s*Math\\.Truncate\\s*\\(\\s*ObjectToNumber\\s*\\(\\s*(.*)\\s*\\)\\s*\\)\\s*\\)", "$1");
    assertEquals(expectedOutput, output);
}

Now some explanation; the \\s* parts allow any number of any whitespace character, where they are. In the pattern, I used (.*) in the middle, which means I match anything there, but it's fine*. I used (.*) instead of .* so that particular section gets captured as capturing group $1 (because $0 is always the whole match). The interesting part being captured, I can refer them in the replacement string.

*as long as you don't have multiple of such assignments within one string. Otherwise, you should break up the string into parts which contain only one such assignment and apply this replacement for each of those strings. Or, try (.*?) instead of (.*), it compiles for me - AFAIK that makes the .* match as few characters as possible.

If the methods actually being called vary, then replace their names in the regex with the variation you expect, like replace CInt with (?CInt|CStr), Math\\.Truncate with Math\\.(?Truncate|Random) etc. (Using (? instead of ( makes that group non-capturing, so they won't take up $1, $2, etc. slots).

If that gets too complicated, than you should really think whether you really want to do it with regex, or whether it'd be easier to just write a relatively longer function with plain string methods, like indexOf and substring :)

Bonus; if absolutely everything varies, but the call depth, then you might try this one:

String output = input.replaceAll("[\\w\\d.]+\\s*\\(\\s*[\\w\\d.]+\\s*\\(\\s*[\\w\\d.]+\\s*\\(\\s*(.*)\\s*\\)\\s*\\)\\s*\\)", "$1");

Yes, it's definitely a nightmare to read, but as far as I understand, you are after this monster :)

You can use ([^()]*) instead of (.*) to prevent deeper nested expressions. Note, that fine control of depth is a real weakness of everyday regular expressions.

Upvotes: 1

Related Questions