Reputation: 817
I've come across a problem that seems simple, yet solution I've found isn't satisfying. I want to replace any period followed by a number with a comma. For example,
strings <- c("22.222.222", "12.323", "don.t replace")
would be successfully transformed as
[1] "22,222,222" "12,323" "don.t replace"
The first solution I attempted was
str_replace_all(strings,
"(?<=\\d+)\\.",
",")
but I got the following error message:
Error in stri_replace_all_regex(string, pattern, replacement, vectorize_all = vec, :
Look-Behind pattern matches must have a bounded maximum length. (U_REGEX_LOOK_BEHIND_LIMIT)
When I use
str_replace_all(strings,
"(?<=\\d{1,3})\\.",
",")
I get the correct transformation. However, the hard-coded 1-3 digits is not what I want and I don't understand why \\d+
doesn't work but \\d{1,3}
does.
Upvotes: 0
Views: 274
Reputation: 626950
You do not need any lookarounds here since the digits are a known subpattern and you do not need any overlapping matches. Just use a capturing group around the digits and restore them in the result with the backreference:
> library(stringr)
> strings <- c("22.222.222", "12.323", "don.t replace")
> str_replace_all(strings, "(\\d+)\\.", "\\1,")
[1] "22,222,222" "12,323" "don.t replace"
Or with the base R gsub
:
> gsub("(\\d+)\\.", "\\1,", strings)
[1] "22,222,222" "12,323" "don.t replace"
See the regex demo
Note that (?<=\\d{1,3})
constrained-width look-behind works because stringr regex flavor is ICU. It will work if the pattern length in the lookbehind can be calculated beforehand, so the limiting quantifier with both min and max values works OK. It will not work with PCRE (perl=T
) regexps with gsub
. Infinite-width lookbehind (with +
and *
quantifiers inside) is only supported in few flavors: Python PyPi regex module, .NET, RegexBuddy tool, Vim.
Upvotes: 2
Reputation: 781350
There's no need for the +
quantifier, all you care about is matching the last digit in the sequence. So just put \d
in the lookbehind.
str_replace_all(strings,
"(?<=\\d)\\.",
",")
Upvotes: 2