r.r
r.r

Reputation: 255

does java support if-then-else regexp constructs(Perl constructs)?

I receive PatternSyntaxException when try to compile the following regex:

"bd".matches("(a)?b(?(1)c|d)")

this regex matches bd and abc. It does not match bc.

any ideas? thanks.

ok i need to write regex to match next 4 strings:

*date date* date date1*date2

should not match:

*date* date1*date2* *date1*date2 date** ...

but this should be done with single matching, not several.

please do not post answer like:

(date*date)|(*date)|(date*)|(date)

Upvotes: 7

Views: 4606

Answers (6)

Alan Moore
Alan Moore

Reputation: 75222

Java doesn't support conditionals, but there's a trick you may be able to use in its place. Check it out:

String[] test = { "abc", "abd", "bc", "bd", "ad", "ac" };
for (String s : test)
{
  System.out.printf("%-4s: %b%n", s, s.matches("(?:a())?b(\\1c|(?!\\1)d)"));
}

output:

abc : true
abd : false
bc  : false
bd  : true
ad  : false
ac  : false

If the string doesn't start with a, the first capturing group doesn't participate in the match and the backreference \1 fails, just like the (1) in your conditional group. Otherwise it matches an empty string, same as the group did.

The other aspect of a conditional is that it performs an exclusive OR; if the condition is true, the second branch should not succeed (so abd should not match). The negated backreference in the second branch achieves that.

This trick works in almost all of the popular, Perl-derived flavors, including Java, .NET, Python, PHP (PCRE), and Ruby (Oniguruma). It doesn't work in ECMAScript implementations like JavaScript and ActionScript.


EDIT: Okay, you've added some sample strings and @sln has shown how to match them with pseudo-conditionals, but I wonder if you really need them. Your "valid" strings seem to consist of at least one date interspersed with at most one *, which can be expressed as

^\*date|date(?:\*(?:date)?)?$

Here's a demo that includes @sln's regex as well as mine.

Upvotes: 3

user557597
user557597

Reputation:

Adding a new answer based on the OP's edit and samples:

ok i need to write regex to match next 4 strings:
*date date* date date1*date2
should not match:
*date* date1*date2* *date1*date2 date** ...

If I think I understand you, you could use a regex based on Alan Moore pseudo conditional trick.

Something like this ^(?:[*]())?date(?:(?!\1)[*](?:date)?|)$ might work.
I am asuming 'date' is the only text in the samples, and each group of non-space characters in the samples are distinct lines of text.

In your text that passes, there is only one form that requires a pseudo conditional. That is 'date*date'. So, I've included a Perl sample below (since I don't have a Java compiler) that expands the regex for clarity.

use strict;
use warnings;

my @samps = qw(

*date
 date*
 date
 date*date
*date*
 date*date*
*date*date
 date**
);

for my $str (@samps)
{
   print "\n'$str'\n";

   if ($str =~
       /
        ^          # Begin of string
        (?:             # Expr grouping
            [*]()          # Asterisk found then DEFINE capture group 1 as empty string
        )?              # End expr group, optional, if asterisk NOT found, capture group 1 stays UNDEFined
        date   #  'data'
        (?:             # Expr grouping
            (?!\1)           # Pseudo conditional: If no asterisk (group 1 is UNDEF), then
            [*](?:date)?     # look for '*' folowed by optional 'data'
          |               # OR,
        )                    # Asterisk or not, should be nothing here
        $          # End of string
      /x)

     {
         print "matched: '$str'\n";
     }
}

Output:

'*date'
matched: '*date'

'date*'
matched: 'date*'

'date'
matched: 'date'

'date*date'
matched: 'date*date'

'*date*'

'date*date*'

'*date*date'

'date**'

Upvotes: 2

tchrist
tchrist

Reputation: 80384

Imagine if you can a language that lacked an else statement, but you wanted to emulate it. Instead of writing

if (condition) { yes part }
else           { no part  }

You would have to write

if (condition)   { yes part }
if (!condition)  { no part  }

Well, that’s what you have to do here, but in the pattern. What you do in Java without conditionals is you repeat the condition, but negate it, in the ELSE block, which is actually an OR block.

So for example, instead of writing this in a language like Perl with conditional support in pattern:

# definition of \b using a conditional in the pattern like Perl
#
(?(?<=      \w)     # if there is a word character to the left
      (?!   \w)     #    then there must be no word character to the right
  |   (?=   \w)     #    else there must be a  word character to the right
)

You must in Java write:

# definition of \b using a duplicated condition like Java
#
(?:   (?<=  \w)     # if there is a word character to the left
      (?!   \w)     #    then there must be no word character to the right
  |                 # ...otherwise...
      (?<!  \w)     # if there is no word character to the left
      (?=   \w)     #    then there must be a word character to the right
)

You may recognize that as being the definition of \b. Here then similarly for \B’s definition, first using conditionals:

# definition of \B using a conditional in the pattern like Perl
#
(?(?<=      \w)     # if there is a word character to the left
      (?=   \w)     #    then there must be a  word character to the right
  |   (?!   \w)     #    else there must be no word character to the right
)

And now by repeating the (now negated) condition in the OR branch:

# definition of \B using a duplicated condition like Java
#
(?:   (?<=  \w)     # if there is a word character to the left
      (?!   \w)     #    then there must be no word character to the right
  |                 # ...otherwise...
      (?<!  \w)     # if there is no word character to the left
      (?=   \w)     #    then there must be a word character to the right
)

Notice how not matter how you roll them, that the respective definitions of \b and \B alike rest solely on the definition of \w, never on \W, let alone on \s.

Being able to use conditionals not only saves typing, it also reduces the chance of doing it wrong. They may also be occasions where you do not care to evaluate the condition twice.

Here I make use of that to define several regex subroutines that provide me with a Greeklish atom and boundaries for the same:

(?(DEFINE)
    (?<greeklish>            [\p{Greek}\p{Inherited}]   )
    (?<ungreeklish>          [^\p{Greek}\p{Inherited}]  )
    (?<greek_boundary>
        (?(?<=      (?&greeklish))
              (?!   (?&greeklish))
          |   (?=   (?&greeklish))
        )
    )
    (?<greek_nonboundary>
        (?(?<=      (?&greeklish))
              (?=   (?&greeklish))
          |   (?!   (?&greeklish))
        )
    )
)

Notice how the boundary and nonboundaries use only (&?greeklish), never (?&ungreeklish)? You don’t ever need the non-whatever just to do boundaries. You put the not into your lookarounds instead, just as \b and \B both do.

Although in Perl it’s probably easier (albeit less general) just to define a new, custom property, \p{IsGreeklish} (and hence its complement \P{IsGreeklish}):

 sub IsGreeklish {
     return <<'END';
 +utf8::IsGreek
 +utf8::IsInherited
 END
 }

You won’t be able to translate either of those into Java though, albeit not so much because of Java’s lack of support for conditionals, but rather because its pattern language doesn’t allow (DEFINE) blocks or regex subroutine calls like (?&greeklish) — and indeed, your patterns cannot even recurse in Java. Nor can you in Java define custom properties like \p{IsGreeklish}.

And of course conditionals in Perl regexes can be more than lookarounds: they can even be code blocks to execute — which is why you certainly don’t want to be forced to evaluate the same condition twice, lest it have side-effects. That doesn’t apply to Java, because it can’t do that. You can’t intermix pattern and code, which limits you more than you might think before you get in the habit of doing so.

There are really a huge whole lot of things you can do with the Perl regex engine that you can do in no other language, and this is just some of that. It’s no wonder that the greatly expanded Regexes chapter in the new 4th edition of Programming Perl, when coupled with the completely rewritten Unicode chapter which now immediately follows the Regexes chapter (having been promoted into part of the inner core), have a combined page count of something like 130 pages, so double the length of the old chapter on pattern matching from the 3rd edition.

What you’ve just seen above is part of the new 4th edition, which should be in print next month or so.

Upvotes: 7

Borodin
Borodin

Reputation: 126722

It is very unlikely that you cannot continue without this facility. I hope you are not falling into the common trap of trying to squeeze to much functionality into a single regex?

Please describe your problem. I am sure there is a better option than using an external library to implement the solution you have devised.

Upvotes: 1

user557597
user557597

Reputation:

According to wikipedia article here, in an engine comparison table, java's doesen't do conditionals.

Upvotes: 0

user289086
user289086

Reputation:

Reading Java 1.5 Pattern spec, Java 1.6 Pattern spec, and Java 7 spec it does not appear to have an if-then-else construct.

The explanation of the regular expression in the question and (some various options for other languages that don't support conditionals) can be found at this blog post. The full explanation (and further confirmation that it isn't supported by Java) can be read at this page

You might look for a third party library to do the pattern matching, but it won't be something that is integrated with the String class.

Upvotes: 0

Related Questions