Reputation: 53
I'm trying to match numbers surrounded by spaces, like this string: " 1 2 3 "
I'm puzzled why the regex \s[0-9]\s
matches 1 and 3 but not 2. Why does this happen?
Upvotes: 0
Views: 67
Reputation: 700592
The expression \s[0-9]\s
mathces " 1 "
and " 3 "
. As the space after the 1
is matched, it can't also be used to match " 2 "
.
You can use a positive lookbehind and a positive lookahead to match digits that are surrounded by spaces:
(?<= )(\d+)(?= )
Demo: https://regex101.com/r/hT1dT6/1
Upvotes: 0
Reputation: 16615
The input is captured, and the subsequent matches won't match, you can use a lookahead to fix this
\s+\d+(?=\s+)
Upvotes: 0
Reputation: 61168
Because the space has already been consumed:
\s[0-9]\s
This matches "spacedigitspace" so lets go through the process
" 1 2 3 "
^
|
No match
" 1 2 3 "
^
|
No match
" 1 2 3 "
^
|
Matches, consume " 1 "
"2 3 "
^
|
No match
"2 3 "
^
|
No match
"2 3 "
^
|
No match
"2 3 "
^
|
Matches, consume " 3 "
You want a lookaround:
(?<=\s)\d(?=\s)
This is very different, as it look for \d
and then asserts that it is preceded by, and followed by, a space. This assertion is "zero width" which means that the spaces aren't consumed by the engine.
Upvotes: 4
Reputation: 726839
More precisely, the regex \s[0-9]\s
does not match 2
only when you go through all matches in the string " 1 2 3 "
one by one. If you were to try to start matching at positions 1 or 2, " 2 "
would be matched.
The reason for this is that \s
is capturing part of the input - namely, the spaces around the digit. When you match " 1 "
, the space between 1
and 2
is already taken; the regex engine is looking at the tail of the string, which is "2 3 "
. At this point, there is no space in front of 2
that the engine could capture, so it goes straight to finding " 3 "
To fix this, put spaces into zero-length look-arounds, like this:
(?<=\s)[0-9](?=\s)
Now the engine ensures that there are spaces in front and behind the digit without consuming these spaces as part of the match. This lets the engine treat the space between 1
and 2
as a space behind 1
and also as a space in front of 2
, thus returning both matches.
Upvotes: 1