PuzzledDavid
PuzzledDavid

Reputation: 53

Regex on numbers and spaces

I'm trying to match numbers surrounded by spaces, like this string: " 1 2 3 "

I'm puzzled why the regex \s[0-9]\s matches 1 and 3 but not 2. Why does this happen?

Upvotes: 0

Views: 67

Answers (4)

Guffa
Guffa

Reputation: 700592

The expression \s[0-9]\s mathces " 1 " and " 3 ". As the space after the 1 is matched, it can't also be used to match " 2 ".

You can use a positive lookbehind and a positive lookahead to match digits that are surrounded by spaces:

(?<= )(\d+)(?= )

Demo: https://regex101.com/r/hT1dT6/1

Upvotes: 0

epoch
epoch

Reputation: 16615

The input is captured, and the subsequent matches won't match, you can use a lookahead to fix this

\s+\d+(?=\s+)

Upvotes: 0

Boris the Spider
Boris the Spider

Reputation: 61168

Because the space has already been consumed:

\s[0-9]\s

This matches "spacedigitspace" so lets go through the process

" 1 2 3 "

 ^
 |

No match

" 1 2 3 "

  ^
  |

No match

" 1 2 3 "

   ^
   |

Matches, consume " 1 "

"2 3 "

 ^
 |

No match

"2 3 "

  ^
  |

No match

"2 3 "

   ^
   |

No match

"2 3 "

    ^
    |

Matches, consume " 3 "

You want a lookaround:

(?<=\s)\d(?=\s)

This is very different, as it look for \d and then asserts that it is preceded by, and followed by, a space. This assertion is "zero width" which means that the spaces aren't consumed by the engine.

Upvotes: 4

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726839

More precisely, the regex \s[0-9]\s does not match 2 only when you go through all matches in the string " 1 2 3 " one by one. If you were to try to start matching at positions 1 or 2, " 2 " would be matched.

The reason for this is that \s is capturing part of the input - namely, the spaces around the digit. When you match " 1 ", the space between 1 and 2 is already taken; the regex engine is looking at the tail of the string, which is "2 3 ". At this point, there is no space in front of 2 that the engine could capture, so it goes straight to finding " 3 "

To fix this, put spaces into zero-length look-arounds, like this:

(?<=\s)[0-9](?=\s)

Now the engine ensures that there are spaces in front and behind the digit without consuming these spaces as part of the match. This lets the engine treat the space between 1 and 2 as a space behind 1 and also as a space in front of 2, thus returning both matches.

Upvotes: 1

Related Questions