Reputation: 203
I want to explore a bit more on regular expressions. Add a space on a string but counting right to left
The result of this regex
preg_replace("/(?=(.{3})*(.{4})$)/", "-", "1231231234");
is: 123-123-1234
Now, I am experimenting with the quantifiers and groups, but I can not make them to work properly.
Why this (php)
preg_replace("/(?=(.{3})*(.{4})(.{4})$)/", "-", "1212312312345678");
and this:
preg_replace("/(?=(.{3})*(.{4}){2}$)/", "-", "1212312312345678");
both give me a big 8 character group as an output
12-123-123-12345678
I probably expected the result on the second case {2}, but not on the first case.
The expected result I intended was:
12-123-123-1234-5678
1) What is the the logic on (.{4})(.{4}) = (.{8}) instead of being 2 diferent events?
2) What would be the proper grouping?
Upvotes: 0
Views: 89
Reputation: 43136
You seem to misunderstand how that regex works. Let me break it down for you:
(?= lookahead assertion: the following pattern must match, but
will not consume any of the text.
(.{3})* matches a series of 3 characters, any number of times. In
other words, this consumes characters in multiples of 3.
(.{4})$ makes sure there are exactly 4 characters left.
)
This pattern produces an empty match in every place where you want to insert a dash -
. That's why preg_replace("/(?=(.{3})*(.{4})$)/", "-", "1231231234");
inserts dashes in the correct places - replacing the empty string is the same as inserting. Let's look at that step-by-step, using the text 31231234
as an example:
remaining text remaining pattern what happens
step 0: 31231234 (.{3})*(.{4})$ (.{3})* matches one time
step 1: 31234 (.{3})*(.{4})$ (.{3})* matches again
step 2: 34 (.{3})*(.{4})$ (.{3})* fails to match another time
step 3: 34 (.{4})$ (.{4}) fails to match -> backtrack
step 5: 31234 (.{4})$ (.{4}) fails to match -> pattern failed to
match, no dash will be inserted.
After the pattern failed to match at position 0 in the text, it will be checked again at position 1 (remaining text is 1231234
):
remaining text remaining pattern what happens
step 0: 1231234 (.{3})*(.{4})$ (.{3})* matches one time
step 1: 1234 (.{3})*(.{4})$ (.{3})* matches again
step 2: 4 (.{3})*(.{4})$ (.{3})* fails to match another time
step 3: 4 (.{4})$ (.{4})$ matches -> dash will be inserted
here, giving "3-1231234"
The same thing happens again 3 characters later, giving the end result 3-123-1234
. In other words, the group (.{4})$
specifies that no dashes should be inserted in the last 4 characters of the text. By consuming the last 4 characters, it makes it impossible for the pattern to match if there are less than 4 characters remaining. That is why both (.{4})(.{4})$
and (.{4}){2}$
produce a block of 8 characters - the pattern can not match if less than 8 characters remain.
In order to insert another dash in the last 8 characters, you have to use two groups of 4 characters .{4}
and make one of them optional:
(?=((.{3})*.{4})?(.{4})$)
Upvotes: 1
Reputation: 21492
(?=(.{3})*(.{4}){2}$)
matches every 3xN character sequence with 2x4 = 8 characters at the end, where N >= 0.
To match every 4xN character from the end, where 1 <= N <= 2, or every 3xN character sequence with 8 characters at the end, where N >= 1, use the following:
preg_replace("/(?=(.{4}){1,2}$)|(?=(.{3})+.{8}$)/", "-", "1212312312345678");
Upvotes: 1
Reputation: 8413
Note that you are using lookaheads in this case. Unlike normal matching, they don't actually consume what they match.
So in the first example, there are 2 zero-width-matches, the first one after the first 123
, so the lookahead matches for 1231234
, the second after the second 123
, where the lookahead matches 1234
. You might want to use one of the online-regex-testers to see what actually matches, my choice would be regex101.com.
So for your example you have to make the lookahead also match the last 4 digits (and only them), one way to achieve this would be (?=((.{3})*(.{4}))?(.{4})$)
, making the first part optional.
See it here on regex101.
Upvotes: 2