RL_Pug
RL_Pug

Reputation: 857

How to make certain portions of a data frame percentages?

Here is my data frame called cat_data

print(cat_data)

Metrics                2016    2017    2018    
Number of Cats         100     120      150     
Number Leaving         32      40       65
Number Staying         68      80       85 
Percent of Leavers     .32     .33      .43
Percent of Stayers     .68     .67      .57 

I want to convert only rows and 5 to percents with the percent symbol.

Here is my desired output.

Metrics                2016    2017    2018    
Number of Cats         100     120      150     
Number Leaving         32      40       65
Number Staying         68      80       85 
Percent of Leavers     32%     33%      43%
Percent of Stayers     68%     67%      57% 

I tried this but I couldn't get it to work.

cat_data[4:5,2:4] <- paste0(cat_data[4:5,2:4] * 100,%)

Can anyone tell me what I need to fix? thank you.

Upvotes: 0

Views: 643

Answers (5)

stefanH
stefanH

Reputation: 353

this should work as well

EDIT

The paste function requires a vector as input, but your input cat_data[4:5,2:4] is a dataframe.

The apply function takes array as input and outputs a vector.

cat_data[4:5,2:4] <- apply(cat_data[4:5,2:4]*100, 2, function(x) paste0(x, "%"))

you take your region of the data frame and multiply by 100. This will be your array input in the apply function. By indicating the margin = 2 the vectorization is performed by column. then you supply the function applied to the vector which in this case is the paste function with your desired % character.

Upvotes: 1

hammoire
hammoire

Reputation: 361

Here is a tidyverse solution. Hard to know what the structure of your data is, but it isn't 'tidy'. I assume you are trying to create a summary table. I have come across similar issues before when trying to do the same. Using the mutate_at function with case_when is one way, if you want to include the % sign then the cols need to be characters.

library(dplyr)
library(stringr)

a <- c("Metrics", "Number of Cats", "Number Leaving", "Number Staying", "Percent of Leavers", "Percent of Stayers")
b <- c(2016, 100, 32, 68, .32, .68)
c <- c(2017, 120, 40, 80, .33, .67)
d <- c(2018, 150, 65, 85, .43, .57)

df <- tibble(a = a ,b = b, c = c, d = d)

df %>% 
  mutate_at(.vars = c("b", "c", "d"), .funs = list(~case_when(a %in% c("Percent of    Leavers", "Percent of Stayers") ~ str_c(round(.x*100), " %"),
                                                            TRUE ~ as.character(.x))))

#OUTPUT

  a                  b     c     d    
  <chr>              <chr> <chr> <chr>
1 Metrics            2016  2017  2018 
2 Number of Cats     100   120   150  
3 Number Leaving     32    40    65   
4 Number Staying     68    80    85   
5 Percent of Leavers 32 %  33 %  43 % 
6 Percent of Stayers 68 %  67 %  57 % 

Upvotes: 0

OTStats
OTStats

Reputation: 1868

As @Phil mentioned in his answer, the issue is that you have conflicting data types. You'd have to convert values for fields 2016, 2017, and 2018 to character. One way would be mutating those fields as such:

# load packages
library(tidyverse)
library(scales)  # package with function for converting decimal to percent

df %>% 
  rowwise() %>% 
  mutate(`2016` = if_else(str_detect(Metrics, "Percent"), scales::percent(`2016`, accuracy = 1), as.character(`2016`))) %>% 
  mutate(`2017` = if_else(str_detect(Metrics, "Percent"), scales::percent(`2017`, accuracy = 1), as.character(`2017`))) %>% 
  mutate(`2018` = if_else(str_detect(Metrics, "Percent"), scales::percent(`2018`, accuracy = 1), as.character(`2018`)))

# # A tibble: 5 x 4
#   Metrics            `2016` `2017` `2018`
#   <fct>              <chr>  <chr>  <chr> 
# 1 Number of Cats     100    120    150   
# 2 Number Leaving     32     40     65    
# 3 Number Staying     68     80     85    
# 4 Percent of Leavers 32%    33%    43%   
# 5 Percent of Stayers 68%    67%    57%  

Upvotes: 0

Allan Cameron
Allan Cameron

Reputation: 173793

Are you sure you want to do this with a data frame in R? Formatting the numbers for presenting them is best done at the end of any analysis, and doing it within a data frame is an unusual choice.

It's not clear from the question what format your columns are in. Are they numeric, factor or character?

Without knowing this, the best way to do this in base R is probably to use lapply across every column, convert it to numeric via character, multiply any value that is less than one by 100, convert the whole column to character format then append a percentage sign to the converted numbers.

However, this will commit you to converting the whole data frame to character strings, so you will no longer be able to maths on them without further conversion. It might be best to re-think how you are going to use or present the data.

That said, here is an implementation of the above method:

as.data.frame(lapply(df, function(x)
{
  if(!any(grepl("Number", x))) 
  {
    x <- as.numeric(as.character(x))
    s <- which(x < 1)
    x[s] <- x[s] * 100 
    x <- as.character(x) 
    x[s] <- paste0(x[s], "%")
  }
  return(x)
}))
#>             Metrics X2016 X2017 X2018
#>1     Number of Cats   100   120   150
#>2     Number Leaving    32    40    65
#>3     Number Staying    68    80    85
#>4 Percent of Leavers   32%   33%   43%
#>5 Percent of Stayers   68%   67%   57%

Upvotes: 0

Phil
Phil

Reputation: 8107

Your code doesn't work because paste0() vectorizes, and your table is not properly set up to allow for vectorization.

It's a little clunky, but you can do this:

cat_data <- tibble::tribble(
  ~"Metrics", ~"2016", ~"2017", ~"2018",
  "Number of Cats", 100, 120,  150,
  "Number Leaving", 32, 40, 65,
  "Number Staying", 68, 80, 85,
  "Percent of Leavers", .32 , .33, .43,
  "Percent of Stayers", .68, .67, .57) # create data

percent_data <- cat_data[4:5,] # separate percent rows
cat_data <- cat_data[-(4:5),] # remove percent rows

for (i in 2:4) { # apply the desired transformation to each column
  percent_data[[i]] <- paste0(percent_data[[i]] * 100, "%")
}

cat_data <- rbind(cat_data, percent_data) # bind them back
cat_data

# A tibble: 5 x 4
  Metrics            `2016` `2017` `2018`
  <chr>              <chr>  <chr>  <chr> 
1 Number of Cats     100    120    150   
2 Number Leaving     32     40     65    
3 Number Staying     68     80     85    
4 Percent of Leavers 32%    33%    43%   
5 Percent of Stayers 68%    67%    57%  

Upvotes: 0

Related Questions