Wendy
Wendy

Reputation: 144

replace decimal point from strings in entire column

I am working on a data set with columns with numbers like this:

icd9code
285.21
593.9
285.21
v04.81

in order to run the R comorbidities package, I need to change them to 5 digits numbers without decimal points.

so they need to look like this:

icd9code
28521
59390
28521
v0481

What function can I use? In particular, how can I get it to show 0 at the end of the number if it has only 4 digits. Also, how can I transfer number starts with 'v'?

Upvotes: 2

Views: 2084

Answers (3)

Josh O'Brien
Josh O'Brien

Reputation: 162461

Here's a vectorized solution:

x <- c("285.21", "593.9", "285.21", "v04.81")

substr(gsub("\\.", "", paste0(x, "00000")), 1, 5)
# [1] "28521" "59390" "28521" "v0481"

Upvotes: 4

marbel
marbel

Reputation: 7714

Here is another way to solve it, in case there are several columns where you would need the replacement. I'm sure there are better ways to do this, but the logic is clear: 1) Split the string of each column 2) Check if the amount of characters after the decimal point and replace accordingly

char <- data.frame(icd9code1 = c("285.21", "593.9", "285.21" ,"v04.81"),
                   icd9code2 = c("285.21", "593.9", "285.21" ,"v04.81"),
                   icd9code3 = c("285.21", "593.9", "285.21" ,"v04.81")
                   )

for(col in 1:dim(char)[2]){
  split_str <- strsplit(char[,col],"\\.")

  for(i in 1:nrow(char)){
    if(nchar(split_str[[i]][2]) == 1){
      char[,col][i] <- paste0(gsub("\\.", "", char[,col][i]),"0")
    } else {
      char[,col][i] <- paste0(gsub("\\.", "", char[,col][i]))
    }
  }
}

# > char
#   icd9code1 icd9code2 icd9code3
# 1     28521     28521     28521
# 2     59390     59390     59390
# 3     28521     28521     28521
# 4     v0481     v0481     v0481

Upvotes: 1

thelatemail
thelatemail

Reputation: 93938

It's not all that pretty, but it should work on all systems:

x <- scan(text="285.21 593.9 285.21 v04.81", what="character")
#[1] "285.21" "593.9"  "285.21" "v04.81"

res <- gsub("\\.","",x)
mapply(paste0, res, sapply(5-nchar(res),rep,x="0"))

#  28521    5939   28521   v0481 
#"28521" "59390" "28521" "v0481" 

Upvotes: 3

Related Questions