Simon
Simon

Reputation: 1111

Remove commas within numbers only

I have a list of product sales (and their cost) which have frustratingly been concatenated into a single string, separated by commas. I ultimately need to separate out each product into unique rows which is easy enough with stringr::str_split.

However, the cost associated with each product has comma to show thousands e.g. 1,000.00 or 38,647.89. Therefore str_split is splitting products incorrectly as it hits commas within a product's cost.

I was wondering what the best tidyverse solution would be to remove all commas which are surrounded by numbers so that 1,000.00 becomes 1000.00 and 38,647.89 becomes 38647.89. Once these commas are removed I can str_split on the commas which delimit the products and thus split each unique product into its own row.

Here is a dummy dataset:

df<-data.frame(id = c(1, 2), product = c("1 Car at $38,678.49, 1 Truck at $78,468.00, 1 Motorbike at $5,634.78", "1 Car at $38,678.49, 1 Truck at $78,468.00, 1 Motorbike at $5,634.78"))

df

  id                                                              product
1 1 Car at $38,678.49, 1 Truck at $78,468.00, 1 Motorbike at $5,634.78
2 1 Car at $38,678.49, 1 Truck at $78,468.00, 1 Motorbike at $5,634.78

Expected outcome:

  id                                                              product
1  1 1 Car at $38678.49, 1 Truck at $78468.00, 1 Motorbike at $5634.78
2  2 1 Car at $38678.49, 1 Truck at $78468.00, 1 Motorbike at $5634.78

Upvotes: 0

Views: 309

Answers (4)

indubitably
indubitably

Reputation: 297

library(tidyverse)
df$product <- str_replace_all(df$product, "(?<=\\d),(?=\\d)", "")
df
  id                                                           product
1  1 1 Car at $38678.49, 1 Truck at $78468.00, 1 Motorbike at $5634.78
2  2 1 Car at $38678.49, 1 Truck at $78468.00, 1 Motorbike at $5634.78

Upvotes: 0

Sotos
Sotos

Reputation: 51592

A way via base R can be,

sapply(strsplit(as.character(df$product), ' '), function(i)paste(sub(',', '', i), collapse = ' '))
#[1] "1 Car at $38678.49, 1 Truck at $78468.00, 1 Motorbike at $5634.78" "1 Car at $38678.49, 1 Truck at $78468.00, 1 Motorbike at $5634.78"

Upvotes: 0

Aron Strandberg
Aron Strandberg

Reputation: 3080

df %>%
  mutate(product = product %>% str_replace_all("([0-9]),([0-9])", "\\1\\2"))

Result

  id                                                           product
1  1 1 Car at $38678.49, 1 Truck at $78468.00, 1 Motorbike at $5634.78
2  2 1 Car at $38678.49, 1 Truck at $78468.00, 1 Motorbike at $5634.78

Upvotes: 4

user2974951
user2974951

Reputation: 10375

> apply(df,1,function(x){gsub(",([0-9])","\\1",x[2])})

[1] "1 Car at $38678.49, 1 Truck at $78468.00, 1 Motorbike at $5634.78"
[2] "1 Car at $38678.49, 1 Truck at $78468.00, 1 Motorbike at $5634.78"

Upvotes: 1

Related Questions