Reputation: 109924
How can I (fastest preferable) remove commas from a digit part of a string without affecting the rest of the commas in the string. So in the example below I want to remove the comas from the number portions but the comma after dog should remain (yes I know the comma in 1023455 is wrong but just throwing a corner case out there).
What I have:
x <- "I want to see 102,345,5 dogs, but not too soo; it's 3,242 minutes away"
Desired outcome:
[1] "I want to see 1023455 dogs, but not too soo; it's 3242 minutes away"
Stipulation: must be done in base no add on packages.
Thank you in advance.
EDIT: Thank you Dason, Greg and Dirk. Both your responses worked very well. I was playing with something close to Dason's response but had the comma inside the parenthesis. Now looking at it that doesn't even make sense. I microbenchmarked both responses as I need speed here (text data):
Unit: microseconds
expr min lq median uq max
1 Dason_0to9 14.461 15.395 15.861 16.328 25.191
2 Dason_digit 21.926 23.791 24.258 24.725 65.777
3 Dirk 127.354 128.287 128.754 129.686 154.410
4 Greg_1 18.193 19.126 19.127 19.594 27.990
5 Greg_2 125.021 125.954 126.421 127.353 185.666
+1 to all of you.
Upvotes: 7
Views: 6943
Reputation: 49650
Here are a couple of options:
> tmp <- "I want to see 102,345,5 dogs, but not too soo; it's 3,242 minutes away"
> gsub('([0-9]),([0-9])','\\1\\2', tmp )
[1] "I want to see 1023455 dogs, but not too soo; it's 3242 minutes away"
> gsub('(?<=\\d),(?=\\d)','',tmp, perl=TRUE)
[1] "I want to see 1023455 dogs, but not too soo; it's 3242 minutes away"
>
They both match a digit followed by a comma followed by a digit. The [0-9]
and \d
(the extra \
escapes the second one so that it makes it through to the regular epression) both match a single digit.
The first epression captures the digit before the comma and the digit after the comma and uses them in the replacement string. Basically pulling them out and putting them back (but not putting the comma back).
The second version uses zero-length matches, the (?<=\\d)
says that there needs to be a single digit before the comma in order for it to match, but the digit itself is not part of the match. The (?=\\d)
says that there needs to be a digit after the comma in order for it to match, but it is not included in the match. So basically it matches a comma, but only if preceded and followed by a digit. Since only the comma is matched, the replacement string is empty meaning delete the comma.
Upvotes: 6
Reputation: 368389
Using Perl regexp, and focusing on "digit comma digit" we then replace with just the digits:
R> x <- "I want to see 102,345,5 dogs, but not too soo; it's 3,242 minutes away"
R> gsub("(\\d),(\\d)", "\\1\\2", x, perl=TRUE)
[1] "I want to see 1023455 dogs, but not too soo; it's 3242 minutes away"
R>
Upvotes: 7
Reputation: 61953
You could replace anything with the pattern (comma followed by a number) with the number itself.
x <- "I want to see 102,345,5 dogs, but not too soo; it's 3,242 minutes away"
gsub(",([[:digit:]])", "\\1", x)
#[1] "I want to see 1023455 dogs, but not too soo; it's 3242 minutes away"
#or
gsub(",([0-9])", "\\1", x)
#[1] "I want to see 1023455 dogs, but not too soo; it's 3242 minutes away"
Upvotes: 9