Tyler Rinker
Tyler Rinker

Reputation: 109924

remove comma from a digits portion string

How can I (fastest preferable) remove commas from a digit part of a string without affecting the rest of the commas in the string. So in the example below I want to remove the comas from the number portions but the comma after dog should remain (yes I know the comma in 1023455 is wrong but just throwing a corner case out there).

What I have:

x <- "I want to see 102,345,5 dogs, but not too soo; it's 3,242 minutes away"

Desired outcome:

[1] "I want to see 1023455 dogs, but not too soo; it's 3242 minutes away"

Stipulation: must be done in base no add on packages.

Thank you in advance.

EDIT: Thank you Dason, Greg and Dirk. Both your responses worked very well. I was playing with something close to Dason's response but had the comma inside the parenthesis. Now looking at it that doesn't even make sense. I microbenchmarked both responses as I need speed here (text data):

Unit: microseconds
         expr     min      lq  median      uq     max
1  Dason_0to9  14.461  15.395  15.861  16.328  25.191
2 Dason_digit  21.926  23.791  24.258  24.725  65.777
3        Dirk 127.354 128.287 128.754 129.686 154.410
4      Greg_1  18.193  19.126  19.127  19.594  27.990
5      Greg_2 125.021 125.954 126.421 127.353 185.666

+1 to all of you.

Upvotes: 7

Views: 6943

Answers (3)

Greg Snow
Greg Snow

Reputation: 49650

Here are a couple of options:

> tmp <- "I want to see 102,345,5 dogs, but not too soo; it's 3,242 minutes away"
> gsub('([0-9]),([0-9])','\\1\\2', tmp )
[1] "I want to see 1023455 dogs, but not too soo; it's 3242 minutes away"
> gsub('(?<=\\d),(?=\\d)','',tmp, perl=TRUE)
[1] "I want to see 1023455 dogs, but not too soo; it's 3242 minutes away"
> 

They both match a digit followed by a comma followed by a digit. The [0-9] and \d (the extra \ escapes the second one so that it makes it through to the regular epression) both match a single digit.

The first epression captures the digit before the comma and the digit after the comma and uses them in the replacement string. Basically pulling them out and putting them back (but not putting the comma back).

The second version uses zero-length matches, the (?<=\\d) says that there needs to be a single digit before the comma in order for it to match, but the digit itself is not part of the match. The (?=\\d) says that there needs to be a digit after the comma in order for it to match, but it is not included in the match. So basically it matches a comma, but only if preceded and followed by a digit. Since only the comma is matched, the replacement string is empty meaning delete the comma.

Upvotes: 6

Dirk is no longer here
Dirk is no longer here

Reputation: 368389

Using Perl regexp, and focusing on "digit comma digit" we then replace with just the digits:

R> x <- "I want to see 102,345,5 dogs, but not too soo; it's 3,242 minutes away"
R> gsub("(\\d),(\\d)", "\\1\\2", x, perl=TRUE)
[1] "I want to see 1023455 dogs, but not too soo; it's 3242 minutes away"
R> 

Upvotes: 7

Dason
Dason

Reputation: 61953

You could replace anything with the pattern (comma followed by a number) with the number itself.

x <- "I want to see 102,345,5 dogs, but not too soo; it's 3,242 minutes away"
gsub(",([[:digit:]])", "\\1", x)
#[1] "I want to see 1023455 dogs, but not too soo; it's 3242 minutes away"
#or
gsub(",([0-9])", "\\1", x)
#[1] "I want to see 1023455 dogs, but not too soo; it's 3242 minutes away"

Upvotes: 9

Related Questions