Reputation: 19110
I want to replace a space between one or two numbers and a colon followed by a space, a number, or the end of the line. If I have a string like,
line = " 0 : 28 : 37.02"
the result should be:
" 0: 28: 37.02"
I tried as below:
line.gsub!(/(\A|[ \u00A0|\r|\n|\v|\f])(\d?\d)[ \u00A0|\r|\n|\v|\f]:(\d|[ \u00A0|\r|\n|\v|\f]|\z)/, '\2:\3')
# => " 0: 28 : 37.02"
It seems to match the first ":"
, but the second ":"
is not matched. I can't figure out why.
Upvotes: 0
Views: 398
Reputation: 110675
The problem
I'll define your regex with comments (in free-spacing mode) to show what it is doing.
r =
/
( # begin capture group 1
\A # match beginning of string (or does it?)
| # or
[ \u00A0|\r|\n|\v|\f] # match one of the characters in the string " \u00A0|\r\n\v\f"
) # end capture group 1
(\d?\d) # match one or two digits in capture group 2
[ \u00A0|\r|\n|\v|\f] # match one of the characters in the string " \u00A0|\r\n\v\f"
: # match ":"
( # begin capture group 3
\d # match a digit
| # or
[ \u00A0|\r|\n|\v|\f] # match one of the characters in the string " \u00A0|\r\n\v\f"
| # or
\z # match the end of the string
) # end capture group 3
/x # free-spacing regex definition mode
Note that '|'
is not a special character ("or") within a character class. It's treated as an ordinary character. (Even if '|'
were treated as "or" within a character class, that would serve no purpose because character classes are used to force any one character within it to be matched.)
Suppose
line = " 0 : 28 : 37.02"
Then
line.gsub(r, '\2:\3')
#=> " 0: 28 : 37.02"
$1 #=> " "
$2 #=> "0"
$3 #=> " "
In capture group 1 the beginning of the line (\A
) is not matched because it is not a character and only characters are not matched (though I don't know why that does not raise an exception). The special character for "or" ('|'
) causes the regex engine to attempt to match one character of the string " \u00A0|\r\n\v\f"
. It therefore would match one of the three spaces at the beginning of the string line
.
Next capture group 2 captures "0"
. For it to do that, capture group 1 must have captured the space at index 2 of line
. Then one more space and a colon are matched, and lastly, capture group 3 takes the space after the colon.
The substring ' 0 : '
is therefore replaced with '\2:\3' #=> '0: '
, so gsub
returns " 0: 28 : 37.02"
. Notice that one space before '0'
was removed (but should have been retained).
A solution
Here's how you can remove the last of one or more Unicode whitespace characters that are preceded by one or two digits (and not more) and are followed by a colon at the end of the string or a colon followed by a whitespace or digit. (Whew!)
def trim(str)
str.gsub(/\d+[[:space:]]+:(?![^[:space:]\d])/) do |s|
s[/\d+/].size > 2 ? s : s[0,s.size-2] << ':'
end
end
The regular expression reads, "match one or more digits followed by one or more whitespace characters, followed by a colon (all these characters are matched), not followed (negative lookahead) by a character other than a unicode whitespace or digit". If there is a match, we check to see how many digits there are at the beginning. If there are more than two the match is returned (no change), else the whitespace character before the colon is removed from the match and the modified match is returned.
trim " 0 : 28 : 37.02"
#=> " 0: 28: 37.02" xxx
trim " 0\v: 28 :37.02"
#=> " 0: 28:37.02"
trim " 0\u00A0: 28\n:37.02"
#=> " 0: 28:37.02"
trim " 123 : 28 : 37.02"
#=> " 123 : 28: 37.02"
trim " A12 : 28 :37.02"
#=> " A12: 28:37.02"
trim " 0 : 28 :"
#=> " 0: 28:"
trim " 0 : 28 :A"
#=> " 0: 28 :A"
If, as in the example, the only characters in the string are digits, whitespaces and colons, the lookbehind is not needed.
You can use Ruby's \p{}
construct, \p{Space}
, in place of the POSIX expression [[:space:]]
. Both match a class of Unicode whitespace characters, including those shown in the examples.
Upvotes: 3
Reputation: 168091
Excluding the third digit can be done with a negative lookback, but since the other one or two digits are of variable length, you cannot use positive lookback for that part.
line.gsub(/(?<!\d)(\d{1,2}) (?=:[ \d\$])/, '\1')
# => " 0: 28: 37.02"
Upvotes: 1