Polaris
Polaris

Reputation: 396

Java regex shortest match

I have the following string, (a.1) (b.2) (c.3) (d.4). I want to change it to (1) (2) (3) (4). I use the following method.

str.replaceAll("\(.*[.](.*)\)","($1)"). And I only get (4). What is the correct method?

Thanks

Upvotes: 3

Views: 3205

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

Root cause

You want to match ()-delimited substrings, but are using .* greedy dot pattern that can match any 0 or more chars (other than line break chars). The \(.*[.](.*)\) pattern will match the first ( in (a.1) (b.2) (c.3) (d.4), then .* will grab the whole string, and backtracking will start trying to accommodate text for the subsequent obligatory subpatterns. [.] will find the last . in the string, the one before the last digit, 4. Then, (.*) will again grab all the rest of the string, but since the ) is required right after, due to backtracking the last (.*) will only capture 4.

Why is lazy / reluctant .*? not a solution?

Even if you use \(.*?[.](.*?)\), if there are (xxx) like substrings inside the string, they will get matched together with expected matches, as . matches any char but line break chars.

Solution

.replaceAll("\\([^()]*\\.([^()]*)\\)", "($1)")

See the regex demo. The [^()] will only match any char BUT a ( and ).

Details

  • \( - a ( char
  • [^()]* - a negated character class matching 0 or more chars other than ( and )
  • \. - a dot
  • ([^()]*) - Group 1 (its value is later referred to with $1 from the replacement pattern): any 0+ chars other than ( and )
  • \) - a ) char.

Java demo:

List<String> strs = Arrays.asList("(a.1) (b.2) (c.3) (d.4)", "(a.1) (xxxx) (b.2) (c.3) (d.4)");
for (String str : strs)
    System.out.println("\"" + str.replaceAll("\\([^()]*\\.([^()]*)\\)", "($1)") + "\"");

Output:

"(1) (2) (3) (4)"
"(1) (xxxx) (2) (3) (4)"

Upvotes: 4

The Scientific Method
The Scientific Method

Reputation: 2436

try this one, it will match any alphabets, . and " and replace them all with empty ""

str.replaceAll("[a-zA-Z\\.\"]", "") 


Edit:

You can use also [^\\d)(\\s] to match all characters that are not number, space and )( and replace them all with empty "" string

String str  = "(a.1) (b.2) (c.3) (d.4)";
System.out.println(str.replaceAll("[^\\d)(\\s]",""));

Upvotes: 2

Mark Peters
Mark Peters

Reputation: 81074

Couple things here. First, your escapes for the parentheses are incorrect. In Java string literals, backslash itself is an escape character, meaning you need to use \\( to represent \( in regex.

I think your question is how to do non-greedy matches in regex. Use ? to specify non-greedy matching; e.g. *? means "zero or more times, but as few times as possible".

This doesn't negate other answers, but they depend on your test input being as simple as it is in your question. This gives me the correct output without changing the spirit of your original regex (that only the parentheses and dot delimiter are known to be present):

String test = "(a.1) (b.2) (c.3) (d.4)";
String replaced = test.replaceAll("\\(.*?[.](.*?)\\)", "($1)");
System.out.println(replaced); // "(1) (2) (3) (4)"

Upvotes: 3

Nambi_0915
Nambi_0915

Reputation: 1091

Try this

str.replaceAll("[A-Za-z0-9]+\.","");

[A-Za-z0-9] will match the upper case, lower case and digits. If you want to match anything before the dot(.) you can use .+ or .* in the place of [A-Za-z0-9]+

Upvotes: 0

Related Questions