Wundwin Born
Wundwin Born

Reputation: 3475

Difference between ? or | in regular expression

From Wiki,

Here is the code tried;

String yuStr =  "-7.00, 10.00, 0.00, -212.000";

//Using ?
System.out.println(yuStr.replaceAll("-?(\\d+)\\.\\d+", "$1"));

//Using |
System.out.println(yuStr.replaceAll("-|(\\d+)\\.\\d+", "$1"));

However, both of two ways will give the same output

7, 10, 0, 212

Is my assumption right, according to Wiki doc above, in second way with ?, it is choice between expression before operator - and expression after operator (\\d+) ?

My question is, why the second way (with |) is working the same way like first one (with ?) ?

Upvotes: 0

Views: 110

Answers (4)

nhahtdh
nhahtdh

Reputation: 56809

They are not the same, they just happen to produce the same output for your example.

The 4 matches in the first regex:

-7.00, 10.00, 0.00, -212.000
-1---  |   |  |  |  |      |
       11---  |  |  |      |
              1---  |      |
                    -111----

In each match, the capturing group captures the whole part of the decimal number, and all four matches are replaced by the content of the capturing group.

The 6 matches in the second regex:

-7.00, 10.00, 0.00, -212.000
-|  |  |   |  |  |  ||     |
 1---  |   |  |  |  ||     |
       11---  |  |  ||     |
              1---  ||     |
                    -|     |
                     111----

The capturing group only captures some text in 4 matches. In the other 2 matches, the capturing group doesn't capture anything.

In Oracle/OpenJDK, referring to a group that captures nothing in the replacement string will result in empty string. Therefore, the - are simply removed, and for this particular input, the end result is the same.

However, for input such as --7.3, the result will be different.

And in Android and GNU Classpath's implementation, a group that captures nothing is rendered as null when referred to in the replacement string, so your output string in the second case will contain null.

Upvotes: 1

TheLostMind
TheLostMind

Reputation: 36304

? and | are used for different reasons. For example :

public static void main(String[] args) {
    String yuStr = "--7";
    System.out.println("1 : " +yuStr.replaceAll("-?\\d","")); // replaces the second - and 7
    System.out.println("2 : " +yuStr.replaceAll("-|\\d","")); // replaces all - and all numbers
}

prouces the OP :

1 : -  // first - is not replaced.
2 :    // empty string

It is just a co-incidence that in your case, they produce the same output. Note : In regex, it is not surprising to have 10 different ways (expressions) to get the same output :)

Upvotes: 2

aioobe
aioobe

Reputation: 421020

The first expression says

     a decimal number, possibly preceded by a -

The second expression says

     A - or a decimal number

When using the expressions as argument to replaceAll, you'll (incidentally) get the same result in this case since you set $1 as the replacement value.

Upvotes: 2

NPE
NPE

Reputation: 500367

The two expressions have different semantics, even though they happen to produce the same results for this particular input.

On some other inputs their result will differ. Consider, for example 1-2. The first would return 1-2 whereas the second would return 12.

Upvotes: 2

Related Questions