choij
choij

Reputation: 257

R tidyverse continuous approach for calculating ratio between 2 character variables

I have been struggling to find continuous tidyverse approach with pipes %>% for calculating ratio between 2 character variables. Tidyverse approach should have only 1 continuous line using pipes %>%.

Here are data frame.

data <- data.frame(c('No', 'No', 'No', 'No', 'Yes', 'No'),
                c('No','Yes', 'No', 'Yes', 'Yes', 'No'))


colnames(data) <- c('smoke', 'diabetes')

data
#  smoke diabetes
#1    No       No
#2    No      Yes
#3    No       No
#4    No      Yes
#5   Yes      Yes
#6    No       No

For R base approach its easy. Calculating ratio of the number of patients who are smoker to the number of patients who have diabetes

#'[R base approach for calculating ratio of the number of patients who are smoker to the number of patients who have diabetes]


count1 <- table(data$smoke)
count2 <- table(data$diabetes)

# Get the Ratio by dividing the counts
ratio <- count1 / count2
ratio
#       No       Yes 
#1.6666667 0.3333333 

But for tidyverse approach with %>% pipes it is confusing.

#'[Tidyverse 1 line with pipes %>% approach for calculating ratio of the number of patients who are smoker to the number of patients who have diabetes]
   
 data %>% group_by(smoke, diabetes) %>% 
      mutate(ratio = sum(smoke == 'Yes') / sum(diabetes == 'Yes'))
    
    # Groups:   smoke, diabetes [3]
    #  smoke diabetes ratio
    #  <chr> <chr>    <dbl>
    #1 No    No         NaN
    #2 No    Yes          0
    #3 No    No         NaN
    #4 No    Yes          0
    #5 Yes   Yes          1
    #6 No    No         NaN

Here you can see that I cannot get the ratio same as R base approach. How can I solve this problem? Should I use case_when()?

Thank you.

Upvotes: 1

Views: 244

Answers (1)

akrun
akrun

Reputation: 887163

Instead of grouping by both columns, we get the count of 'Yes' in both with summarise and across, then return the 'ratio' of both columns

library(dplyr)
data %>% 
  summarise(across(everything(), ~ sum(. == 'Yes'))) %>% 
  mutate(ratio = diabetes/smoke)

Upvotes: 1

Related Questions