Wietse Venema
Wietse Venema

Reputation: 2764

Regular expression doesn't match empty string in multiline mode (Java)

I just observed this behavior;

Pattern p1 = Pattern.compile("^$");
Matcher m1 = p1.matcher("");
System.out.println(m1.matches()); /* true */

Pattern p2 = Pattern.compile("^$", Pattern.MULTILINE);
Matcher m2 = p2.matcher("");
System.out.println(m2.matches()); /* false */

It strikes me as odd that the last statement is false. This is what the docs say;

By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence. If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence. http://docs.oracle.com/javase/1.4.2...

From what I get from this, it should match? The following makes things even more confusing;

Pattern p3 = Pattern.compile("^test$");
Matcher m3 = p3.matcher("test");
System.out.println(m3.matches()); /* true */

Pattern p4 = Pattern.compile("^test$", Pattern.MULTILINE);
Matcher m4 = p4.matcher("test");
System.out.println(m4.matches()); /* true */

So what is this? How do I make sense of this? I hope someone can shed some light on this, would be really appreciated.

Upvotes: 16

Views: 2437

Answers (3)

user557597
user557597

Reputation:

Sounds like a bug. At most, in multi-line mode, "^" and "$" could be interpreted as matching at an internal line boundary. Java might not have extended variable state structure say, like Perl does. I don't know if this is even a cause.

The fact that /^test$/m matches just prove ^$ work in multi-line mode except when the string is empty (in Java), but clearly multi-line mode test for empty string is ludicrous since /^$/ work for that.

Testing in Perl, everything works as expected:

if ( "" =~ /^$/m   ) { print "/^\$/m    matches\n"; }
if ( "" =~ /^$/    ) { print "/^\$/     matches\n"; }
if ( "" =~ /\A\Z/m ) { print "/\\A\\Z/m  matches\n"; }
if ( "" =~ /\A\Z/  ) { print "/\\A\\Z/   matches\n"; }
if ( "" =~ /\A\z/  ) { print "/\\A\\z/   matches\n"; }
if ( "" =~ /^/m    ) { print "/^/m     matches\n"; }
if ( "" =~ /$/m    ) { print "/\$/m     matches\n"; }


__END__


/^$/m    matches
/^$/     matches
/\A\Z/m  matches
/\A\Z/   matches
/\A\z/   matches
/^/m     matches
/$/m     matches

Upvotes: 1

Ingo
Ingo

Reputation: 36329

If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input.

Since you are at the end of input, ^ can't match in multiline mode.

This is surprising, even disgusting, but nevertheless according to its documentation.

Upvotes: 9

wanderlust
wanderlust

Reputation: 1936

Let's look a bit closer at your second example:

Pattern p2 = Pattern.compile("^$", Pattern.MULTILINE);
Matcher m2 = p2.matcher("");
System.out.println(m2.matches()); /* false */

So you have a line in m2, that is empty OR contains only character of endline and no other characters. Therefore you pattern, in order to correspond to the given line, should be only "$" i.e.:

// Your example
Pattern p2 = Pattern.compile("^$", Pattern.MULTILINE);
Matcher m2 = p2.matcher("");
System.out.println(m2.matches()); /* false */

// Let's check if it is start of the line
p2 = Pattern.compile("^", Pattern.MULTILINE);
m2 = p2.matcher("");
System.out.println(m2.matches()); /* false */

// Let's check if it is end of the line
p2 = Pattern.compile("$", Pattern.MULTILINE);
m2 = p2.matcher("");
System.out.println(m2.matches()); /* true */

Upvotes: 2

Related Questions