Melih Altıntaş
Melih Altıntaş

Reputation: 2535

How this regex works?

System.out.println("du hast mich".replaceAll("(?<=^(.*)) ", ", $1 "));
// prints "du, du hast, du hast mich"

what is the mean of ^ symbol after the look behind ? (I know standard mean of this symbol is start of the line) and why dot symbol matches up to du then du hast then du hast mich.In briesf why the dot symbol didn't match the whole string?

Please give me an explanation how this regex works properly.I am wondering.Thanks for your interest.

Upvotes: 4

Views: 160

Answers (3)

Alan Moore
Alan Moore

Reputation: 75222

That regex shouldn't work at all. What it should do is throw an exception because of the open-ended quantifier (.*) in the lookbehind. You seem to have discovered a glitch that lets you bypass that rule. But don't use it! It's definitely a bug, not a feature.

Java's lookbehinds have always been a little twitchy, which I attribute to its complicated known maximum length requirement for lookbehind subexpressions. I've come to feel that feature was a mistake; it's just not useful to enough to justify the hassles it brought with it. This is why I try to avoid using any quantifiers in my lookbehinds.

Upvotes: 0

Sotirios Delimanolis
Sotirios Delimanolis

Reputation: 279910

Kendall has the explanation. Here's the step by step.

du hast mich
 ^ regex hasn't matched anything so no replacement

writes

du

Next

du hast mich
  ^ regex matches

replaces the match with a comma and everything before the space

, du

Next

du hast mich
      ^ no match

writes

hast

Next

du hast mich
       ^ regex matches

replaces that match with a comma and everything before the space

, du hast

Next

du hast mich
           ^ no match

leaves it as is

mich

combine all that and you get

du, du hast, du hast mich

Upvotes: 2

Kendall Frey
Kendall Frey

Reputation: 44316

(?<= ) is the syntax for lookbehind. The ^ is just the "start of string" anchor. Essentially what the regex is saying is:

"Match a space which is preceded by the start of the string and any number of characters. The characters preceding the space are the first captured group."

Upvotes: 3

Related Questions