user1413457
user1413457

Reputation: 89

Translating regular expression with conditional into Java

I am trying to translate this regular expression into Java:

^(\s*([<>]=?)?\s*!?(?:(2)[0-9]{1,5}|[0-9\*]{1,5})\s*(&|$))*

I know of course that conditionals are not supported. A direct translation leads to an exception. Thus I would like to get ideas how to solve the problem.

Thanks,

Upvotes: 2

Views: 401

Answers (1)

tchrist
tchrist

Reputation: 80384

First, I think you have a bug in your pattern:

^(\s*([<>]=?)?\s*!?(?:(2)[0-9]{1,5}|[0-9\*]{1,5})\s*(&|$))*

You seem to have a colon in front of your test of group 2, which won’t do what you want. That would need to be:

^(\s*([<>]=?)?\s*!?(?(2)[0-9]{1,5}|[0-9\*]{1,5})\s*(&|$))*

But there are other oddities that don’t make much sense to me. I’ll rewrite your pattern in (?x) mode so that we can unravel it and try to make some sense of it. Oh, and I’ll get rid of that extraneous backslash in the [0-9\*] in the or-branch of your conditional, since it should really be just [0-9*].

That produces this:

(?x)                       # enable comments and whitespace
^                          # anchor to beginning of string
(                          # begin GROUP #1 {
    \s *                   #     any amount of whitespace, including none
    (                      #     begin GROUP #2 {
        [<>]               #        exactly one of either kind of pointy bracket
        = ?                #        optional equals sign
    ) ?                    #     } end GROUP #2, make optional
    \s *                   #     any amount of whitespace, including none
    ! ?                    #     optional exclamation point
    (?(2)                  #     if GROUP#2 is defined {
          [0-9]   {1,5}    #         then: 1-5× ASCII digits
     |    [0-9*]  {1,5}    #         else: 1-5× of either star or ASCII digit
    )                      #     } end ifdef GROUP#2
    \s *                   #     any amount of whitespace, including none
    (                      #     begin GROUP#3 {
        &                  #        either:  an ampersand
      | $                  #        or else: end of string
    )                      #     } end GROUP#3
) *                        # } end GROUP #1, make optional but allow repeats

As near as I can tell, that is what you are actually trying to do. Why you are doing it, I have no idea, because there is stuff there that seems odd.

For example, why apply a repetition operator to the first capture group? It won’t hold all the repetitions, only the final one.

Another question is why allow for zero repeats of group one? Just like how *all possible strings are matched by the pattern ^a*, so too are all possible strings matched by your pattern. This seems less than useful.

Lastly, having either an ampersand or the end-of-string is pretty weird towards the end there.

If the original poster would clarify his intent, I will translate this into something that works with Java regexes, which do not support the conditional construct you’ve used here, something that Perl, PHP, PCRE, and C all support but not Java. (What language did this come out of, anyway?) The way you have to do that is to unroll the conditional with an or-branch, where both cases are covered.

I am a bit dubious about the entire pattern, because it doesn’t seem sensible. Some sample inputs it is supposed to match would be appreciated.

One thing I cannot stress strongly enough is that the /x-expanded version of the regex that I have provided is the only way you should ever, ever, ever write these things. That garbledy gook without whitespace, indentation, logical group, and comments is completely unacceptable. Things like this should never pass code review. They are abominations.

And they don’t have to be. I beg you to always always always use /x mode for any regex of non-trivial length and complexity, like this one. Try to think of those who will come after you, hopefully before they do so.

Lastly, I wonder why this uses numbered group instead of the more mnemonic named groups, which are much more robust. Plus Java 7 finally supports named groups, so you wouldn’t have to compromise there.

Upvotes: 2

Related Questions