Reputation: 749
I have a large data frame where the percentages are written as 10% and not .1. Not all columns are percentage, but quite a few are.
Is there an elegant way to convert all % into decimals? I'm especially concerned where percentages might be greater than 100% and that the rule can be applied to the entire tibble instead of me having to figure out which columns to target.
Example if not clear... this:
tibble(cola = c("hello", "good bye", "hi there"), colb = c("10%", "20%", "100%"), colc = c(53, 67, 89),cold = c("10%", "200%", "50%") )
to this:
tibble(cola = c("hello", "good bye", "hi there"), colb = c(.10, .20, 1.0), colc = c(53, 67, 89),cold = c(.10, 2.0, .5) )
Thanks.
Upvotes: 2
Views: 648
Reputation: 26353
Using baseR, we can get the column names where all entries end with "%", substitute the "%" at the end of the string by "" and divide by 100.
idx <- rapply(dat, f = function(x) all(endsWith(x, "%")), classes = "character")
dat[names(idx)[idx]] <- lapply(dat[names(idx)[idx]], function(x) {
as.integer(sub("%$", "", x)) / 100L
})
Result
dat
# cola colb colc cold
#1 hello 0.1 53 0.1
#2 good bye 0.2 67 2.0
#3 hi there 1.0 89 0.5
data
dat <-
data.frame(
cola = c("hello", "good bye", "hi there"),
colb = c("10%", "20%", "100%"),
colc = c(53, 67, 89),
cold = c("10%", "200%", "50%")
)
Upvotes: 3
Reputation: 76585
Write an auxiliary function and mutate_if
based on its value.
is.percentage <- function(x) any(grepl("%$", x))
df1 %>%
mutate_if(is.percentage, ~as.numeric(sub("%", "", .))/100)
## A tibble: 3 x 4
# cola colb colc cold
# <chr> <dbl> <dbl> <dbl>
#1 hello 0.1 53 0.1
#2 good bye 0.2 67 2
#3 hi there 1 89 0.5
Upvotes: 5
Reputation: 887541
Here is one option with across/mutate
where we select columns that have character
class and (&&
) any
value having the %
, mutate
across
those columns, extract the numeric part with parse_number
and divide by 100
library(dplyr) # 1.0.0
library(stringr)
df1 %>%
mutate(across(where(~ is.character(.) &&
any(str_detect(., "%"))), ~ readr::parse_number(.)/100))
# A tibble: 3 x 4
# cola colb colc cold
# <chr> <dbl> <dbl> <dbl>
#1 hello 0.1 53 0.1
#2 good bye 0.2 67 2
#3 hi there 1 89 0.5
Upvotes: 3