user13518916
user13518916

Reputation: 11

Dplyr: Rename Tibble Output Columns With Factor Levels

I am trying to find a way to rename my factor levels (1, 2, 3) with girl, boy, other in the dplyr tibble output.

This is the code:

library(dplyr)
df1 %>%
dplyr::group_by(sex)%>%
dplyr::summarise(percent=100*n()/nrow(df1), n=n())

And my result is:

# A tibble: 3 x 3
  sexs    percent    n
   <int>   <dbl> <int>
1      1  52.1     731
2      2  47.1     661
3     NA   0.855    12

The desired result would be:

# A tibble: 3 x 3
      sexs    percent    n
       <int>   <dbl> <int>
Girl     1  52.1     731
Boy      2  47.1     661
Other   NA   0.855    12

Upvotes: 1

Views: 218

Answers (2)

Chuck P
Chuck P

Reputation: 3923

I happen to love the forcats package because when I get done I can actually see what I did. Another solution by simply adding to the pipe before your existiung code.

library(dplyr)
library(forcats)

sex <- sample(1:2, 100, replace = TRUE)
sex[[88]] <- NA
df1 <- data.frame(sex)

df1 %>% 
  mutate(newsex = fct_explicit_na(fct_recode(as_factor(sex), 
                                             Girl = "1", 
                                             Boy = "2" ), 
                                  na_level = "Other")) %>% 
  group_by(newsex, sex) %>%
  summarise(percent = 100 * n() / nrow(df1), n=n())
#> # A tibble: 3 x 4
#> # Groups:   newsex [3]
#>   newsex   sex percent     n
#>   <fct>  <int>   <dbl> <int>
#> 1 Girl       1      56    56
#> 2 Boy        2      43    43
#> 3 Other     NA       1     1

Created on 2020-05-11 by the reprex package (v0.3.0)

Upvotes: 1

NotThatKindODr
NotThatKindODr

Reputation: 719

When posting please provide some sample data to work with, it will help others test and make sure everything is working properly. This problem is relatively simple so it shouldn't be a problem.

If you want to replace the NA with literally any other number you can do this

df1 %>%
    dplyr::mutate(sex = ifelse(is.na(sex), 0, sex),
                  sex = factor(sex, 
                               levels = c(1,2,0), 
                               labels = c("Girl", "Boy", "Other"))) %>% 
    dplyr::group_by(sex)%>%
    dplyr::summarise(percent=100*n()/nrow(df1), n=n())

Otherwise you can use case_when to assign the factors and then convert the column to a factor

 df1 %>%
  dplyr::mutate(sex = case_when(
                                sex == 1 ~ "Girl",
                                sex == 2 ~ "Boy",
                                is.na(sex) ~ "Other") %>% 
  as_factor(.)) %>% 
  dplyr::group_by(sex)%>%
  dplyr::summarise(percent=100*n()/nrow(df1), n=n())

Upvotes: 0

Related Questions