b3bop
b3bop

Reputation: 3683

java regex quantifiers

I have a string like

String string = "number0 foobar number1 foofoo number2 bar bar bar bar number3 foobar";

I need a regex to give me the following output:

number0 foobar
number1 foofoo
number2 bar bar bar bar
number3 foobar

I have tried

Pattern pattern = Pattern.compile("number\\d+(.*)(number\\d+)?");
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
    System.out.println(matcher.group());
}

but this gives

number0 foobar number1 foofoo number2 bar bar bar bar number3 foobar

Upvotes: 7

Views: 1118

Answers (6)

vidak
vidak

Reputation: 1

Pattern pattern = Pattern.compile("\\w+\\d(\\s\\w+)\1*");
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println(matcher.group());
}

Upvotes: 0

shift66
shift66

Reputation: 11958

because .* is a greedy pattern. use .*? instead of .*

Pattern pattern = Pattern.compile("number\\d+(.*?)(number\\d+)");
Matcher matcher = pattern.matcher(string);
while(matcher.find();){
    out(matcher.group());
}

Upvotes: 0

Daniel
Daniel

Reputation: 28074

Why don't you just match for number\\d+, query the match location, and do the String splitting yourself?

Upvotes: 0

LeleDumbo
LeleDumbo

Reputation: 9340

(.*) part of your regex is greedy, therefore it eats everything from that point to the end of the string. Change to non-greedy variant: (.*)?

http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

Upvotes: -1

AlexR
AlexR

Reputation: 115328

If "foobar" is just an example and really you mean "any word" use the following pattern: (number\\d+)\s+(\\w+)

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336078

So you want number (+ an integer) followed by anything until the next number (or end of string), right?

Then you need to tell that to the regex engine:

Pattern pattern = Pattern.compile("number\\d+(?:(?!number).)*");

In your regex, the .* matched as much as it could - everything until the end of the string. Also, you made the second part (number\\d+)? part of the match itself.

Explanation of my solution:

number    # Match "number"
\d+       # Match one of more digits
(?:       # Match...
 (?!      #  (as long as we're not right at the start of the text
  number  #   "number"
 )        #  )
 .        # any character
)*        # Repeat as needed.

Upvotes: 10

Related Questions