Reputation: 144
I am working on a data set with columns with numbers like this:
icd9code
285.21
593.9
285.21
v04.81
in order to run the R comorbidities
package, I need to change them to 5 digits numbers without decimal points.
so they need to look like this:
icd9code
28521
59390
28521
v0481
What function can I use? In particular, how can I get it to show 0 at the end of the number if it has only 4 digits. Also, how can I transfer number starts with 'v'?
Upvotes: 2
Views: 2084
Reputation: 162461
Here's a vectorized solution:
x <- c("285.21", "593.9", "285.21", "v04.81")
substr(gsub("\\.", "", paste0(x, "00000")), 1, 5)
# [1] "28521" "59390" "28521" "v0481"
Upvotes: 4
Reputation: 7714
Here is another way to solve it, in case there are several columns where you would need the replacement. I'm sure there are better ways to do this, but the logic is clear: 1) Split the string of each column 2) Check if the amount of characters after the decimal point and replace accordingly
char <- data.frame(icd9code1 = c("285.21", "593.9", "285.21" ,"v04.81"),
icd9code2 = c("285.21", "593.9", "285.21" ,"v04.81"),
icd9code3 = c("285.21", "593.9", "285.21" ,"v04.81")
)
for(col in 1:dim(char)[2]){
split_str <- strsplit(char[,col],"\\.")
for(i in 1:nrow(char)){
if(nchar(split_str[[i]][2]) == 1){
char[,col][i] <- paste0(gsub("\\.", "", char[,col][i]),"0")
} else {
char[,col][i] <- paste0(gsub("\\.", "", char[,col][i]))
}
}
}
# > char
# icd9code1 icd9code2 icd9code3
# 1 28521 28521 28521
# 2 59390 59390 59390
# 3 28521 28521 28521
# 4 v0481 v0481 v0481
Upvotes: 1
Reputation: 93938
It's not all that pretty, but it should work on all systems:
x <- scan(text="285.21 593.9 285.21 v04.81", what="character")
#[1] "285.21" "593.9" "285.21" "v04.81"
res <- gsub("\\.","",x)
mapply(paste0, res, sapply(5-nchar(res),rep,x="0"))
# 28521 5939 28521 v0481
#"28521" "59390" "28521" "v0481"
Upvotes: 3