ccampisano
ccampisano

Reputation: 31

Problems making a regex more readable than its literal

The following regex has been validated on regex101 and works fine, matching either "()", or "[]" or "{}":

\(\)|\[]|\{}

However:

Now I'd like to make it more readable by using Unicode (which should avoid escaping) and constants, defining it like this:

private static final String MATCH_OPENING_BRACE = "\u0028";
private static final String MATCH_CLOSING_BRACE = "\u0029";

private static final String MATCH_OPENING_SQUARE_BRACE = "\u005B";
private static final String MATCH_CLOSING_SQUARE_BRACE = "\u005D";

private static final String MATCH_OPENING_CURLY_BRACE = "\u007B";
private static final String MATCH_CLOSING_CURLY_BRACE = "\u007D";

private static final String MATCHING_OR_FLAG = "|";

private static final String COMPLETE_REGEX = 
    MATCH_OPENING_BRACE + MATCH_CLOSING_BRACE 
    + MATCHING_OR_FLAG + MATCH_OPENING_SQUARE_BRACE + MATCH_CLOSING_SQUARE_BRACE
    + MATCHING_OR_FLAG + MATCH_OPENING_CURLY_BRACE + MATCH_CLOSING_CURLY_BRACE;

private static final String REGEX_REPLACEMENT = ""; 

so that I can write readable code like this:

@Override
public boolean isValid(String input) {

    for (int i = input.length() / 2; i > 0; i--)
        input = input.replaceAll(COMPLETE_REGEX, REGEX_REPLACEMENT);

    return input.isEmpty();
}

instead of using that unreadable literal, like this:

@Override
public boolean isValid(String input) {

    for (int i = input.length() / 2; i > 0; i--)
        input = input.replaceAll("\\(\\)|\\[]|\\{}", "");

    return input.isEmpty();
}

But here the following exception is thrown:

java.util.regex.PatternSyntaxException: Unclosed character class near index 7
    ()|[]|{}
          ^

I tried adding an escape char, like this:

private static final String MATCH_OPENING_CURLY_BRACE = "\\\u007B";

but that only gives a similar exception:

java.util.regex.PatternSyntaxException: Unclosed character class near index 8
    ()|[]|\{}
           ^

Any hints?

Upvotes: -7

Views: 83

Answers (2)

ccampisano
ccampisano

Reputation: 31

as @user85421 mentioned, using unicodes won't make the escapes unrequired, as I though it would.

so, escaping (, ), [ and { is still required, here's a fix:

    private static final String MATCH_OPENING_BRACE = "\\(";
    private static final String MATCH_CLOSING_BRACE = "\\)";
    
    private static final String MATCH_OPENING_SQUARE_BRACE = "\\[";
    private static final String MATCH_CLOSING_SQUARE_BRACE = "]";
    
    private static final String MATCH_OPENING_CURLY_BRACE = "\\{";
    private static final String MATCH_CLOSING_CURLY_BRACE = "}";

the above works fine, just like the following, with all the meta-characters escaped (see @g00se comment below):

    private static final String MATCH_OPENING_BRACE = "\\(";
    private static final String MATCH_CLOSING_BRACE = "\\)";
    
    private static final String MATCH_OPENING_SQUARE_BRACE = "\\[";
    private static final String MATCH_CLOSING_SQUARE_BRACE = "\\]";
    
    private static final String MATCH_OPENING_CURLY_BRACE = "\\{";
    private static final String MATCH_CLOSING_CURLY_BRACE = "\\}";

indeed, the original regex was NOT escaping ALL the meta-characters, as you can see, and it still works fine:

    input = input.replaceAll("\\(\\)|\\[]|\\{}", "");

indeed, all the tests runs fine also with all meta-characters escaped:

input = input.replaceAll("\\(\\)|\\[\\]|\\{\\}", "");

Upvotes: -1

user85421
user85421

Reputation: 29670

Maybe using comments and spaces to explain and format the expression:

String regex = """
    (?x)     # allows comments and ignore whitespace
      \\(\\) # ()  escaped
    |        # or
      \\[]   # []  escaped
    |        # or
      \\{}   # {}  escaped
    """;

the formatting can be changed to your liking
drawback: relevant # and spaces must also be escaped


For longer sequences Pattern#quote can be used. Probably not so useful for small sequences (like ())

Upvotes: 3

Related Questions