Reputation: 3567
I have a string that comes from user input through a messaging system, this can contain a series of 4 digit numbers, but as users are likely to type things in wrong it needs to be a little bit flexible. Therefore I want to allow them to type in the numbers, or pepper their message with any string of characters and then just take the numbers that match the formats
=nnnn or nnnn
For this I have the Regular Expression:
(^|=|\s)\d{4}(\s|$)
Which almost works, however as it says that each group of 4 digits must start with an =, a space, or the start of the string it misses every other set of numbers
I tried this:
(^|=|\s*)\d{4}(\s|$)
But that means that any four digits followed by a space get matched - which is incorrect.
How can I match groups of numbers, but include a single space at the end of one group, and the beginning of the next, to clarify this string:
Ack 9876 3456 3467 4578 4567
Should produce the matches:
9876
3456
3467
4578
4567
Upvotes: 1
Views: 5199
Reputation: 67978
\b\d+\b
\b
asserts position at a word boundary (^\w|\w$|\W\w|\w\W
). It is a 0-width anchor, much like ^
and $
. It doesn't consume any characters.
or
(?:^|(?<=[=\s]))\d{4}\b
Upvotes: 1
Reputation: 174786
Here you need to use lookarounds which won't consume any characters.
(?:^|[=\s])\K\d{4}(?=\s|$)
OR
(?:^|[=\s])(\d{4})(?=\s|$)
Your regex (^|=|\s)\d{4}(\s|$)
fails because at first this would match <space>9876<space>
then it would look for another space or equals or start of the line. So now it finds the next match at <space>3467<space>
. It won't match 3456
because the space before 3456
was already consumed in the first match. In-order to do overlapping matches, you need to put the pattern inside positive lookarounds. So when you put the last pattern (\s|$)
inside lookahead, it won't consume the space, it just asserts that the match must be followed by a space or end of the line boundary.
Upvotes: 2