Reputation: 77
I have the following regular expression for capturing positive & negative time offsets.
\b(?<sign>[\-\+]?)(?<hours>2[1-3]|[01][0-9]|[1-9]):(?<minutes>[0-5]\d)\b
It matches fine but the leading sign doesn't appear in the capture group. Am I formatting it wrong? You can see the effect here https://regex101.com/r/CQxL8q/1/
Upvotes: 1
Views: 306
Reputation: 72256
The word boundary anchor (\b
) matches the transition between a word character (letter, digit or underscore) to a non-word character or vice-versa. There is no such transition in -13:21
.
The word boundary anchor could stay between the sign and the hours to avoid matching it in expressions that looks similar to a time (65401:23
) but you cannot prevent it match 654:01:23
or 654-01:23
.
As a side note [\-\+]
is just a convoluted way to write [-+]
. +
does not have any special meaning inside a character class, there is no need to escape it. -
is a special character inside a character class but not when it is the first or the last character (i.e. [-
or -]
).
Another remark: you use both [0-9]
and \d
in your regex
. They denote the same thing1 but, for readability, it's recommended to stick to only one convention. Since other character classes that contain only digits are used, I would use [0-9]
and not \d
.
And some bugs in the regex fragment for hours: 2[1-3]|[01][0-9]|[1-9]
do not match 0
(but it matches 00
) and 20
.
Given all the above corrections and improvements, the regex
should be:
(?<sign>[-+]?)\b(?<hours>2[0-3]|[01][0-9]|[0-9]):(?<minutes>[0-5][0-9])\b
1 \d
is the same as [0-9]
when the Unicode flag is not set. When Unicode is enabled, \d
also matches the digits in non-Latin based alphabets.
Upvotes: 1
Reputation: 627082
That is because of the first \b
. The \b
word boundary does not match between a start of the string/newline and a -
or +
(i.e. a non-word char).
You need to move the word boundary after the optional sign
group:
(?<sign>[-+]?)\b(?<hours>2[1-3]|[01][0-9]|[1-9]):(?<minutes>[0-5][0-9])\b
^^
See the regex demo.
Now, since the char following the word boundary is a digit (a word char) the word boundary will work correctly failing all matches where the digit is preceded with another word char.
Upvotes: 1