seansteele
seansteele

Reputation: 749

Convert all % with decimal in R

I have a large data frame where the percentages are written as 10% and not .1. Not all columns are percentage, but quite a few are.

Is there an elegant way to convert all % into decimals? I'm especially concerned where percentages might be greater than 100% and that the rule can be applied to the entire tibble instead of me having to figure out which columns to target.

Example if not clear... this:

tibble(cola = c("hello", "good bye", "hi there"), colb = c("10%", "20%", "100%"), colc = c(53, 67, 89),cold = c("10%", "200%", "50%") )

to this:

tibble(cola = c("hello", "good bye", "hi there"), colb = c(.10, .20, 1.0), colc = c(53, 67, 89),cold = c(.10, 2.0, .5) )

Thanks.

Upvotes: 2

Views: 648

Answers (3)

markus
markus

Reputation: 26353

Using baseR, we can get the column names where all entries end with "%", substitute the "%" at the end of the string by "" and divide by 100.

idx <- rapply(dat, f = function(x) all(endsWith(x, "%")), classes = "character")
dat[names(idx)[idx]] <- lapply(dat[names(idx)[idx]], function(x) {
  as.integer(sub("%$", "", x)) / 100L
  })

Result

dat
#      cola colb colc cold
#1    hello  0.1   53  0.1
#2 good bye  0.2   67  2.0
#3 hi there  1.0   89  0.5

data

dat <-
  data.frame(
    cola = c("hello", "good bye", "hi there"),
    colb = c("10%", "20%", "100%"),
    colc = c(53, 67, 89),
    cold = c("10%", "200%", "50%")
  )

Upvotes: 3

Rui Barradas
Rui Barradas

Reputation: 76585

Write an auxiliary function and mutate_if based on its value.

is.percentage <- function(x) any(grepl("%$", x))

df1 %>%
  mutate_if(is.percentage, ~as.numeric(sub("%", "", .))/100)
## A tibble: 3 x 4
#  cola      colb  colc  cold
#  <chr>    <dbl> <dbl> <dbl>
#1 hello      0.1    53   0.1
#2 good bye   0.2    67   2  
#3 hi there   1      89   0.5

Upvotes: 5

akrun
akrun

Reputation: 887541

Here is one option with across/mutate where we select columns that have character class and (&&) any value having the %, mutate across those columns, extract the numeric part with parse_number and divide by 100

library(dplyr) # 1.0.0
library(stringr)
df1 %>% 
    mutate(across(where(~ is.character(.) &&
         any(str_detect(., "%"))), ~ readr::parse_number(.)/100))
# A tibble: 3 x 4
#  cola      colb  colc  cold
#  <chr>    <dbl> <dbl> <dbl>
#1 hello      0.1    53   0.1
#2 good bye   0.2    67   2  
#3 hi there   1      89   0.5

Upvotes: 3

Related Questions