Reputation: 257
I have been struggling to find continuous tidyverse
approach with pipes %>%
for calculating ratio between 2 character variables.
Tidyverse
approach should have only 1 continuous line using pipes %>%
.
Here are data frame
.
data <- data.frame(c('No', 'No', 'No', 'No', 'Yes', 'No'),
c('No','Yes', 'No', 'Yes', 'Yes', 'No'))
colnames(data) <- c('smoke', 'diabetes')
data
# smoke diabetes
#1 No No
#2 No Yes
#3 No No
#4 No Yes
#5 Yes Yes
#6 No No
For R base
approach its easy. Calculating ratio of the number of patients who are smoker to the number of patients who have diabetes
#'[R base approach for calculating ratio of the number of patients who are smoker to the number of patients who have diabetes]
count1 <- table(data$smoke)
count2 <- table(data$diabetes)
# Get the Ratio by dividing the counts
ratio <- count1 / count2
ratio
# No Yes
#1.6666667 0.3333333
But for tidyverse
approach with %>% pipes
it is confusing.
#'[Tidyverse 1 line with pipes %>% approach for calculating ratio of the number of patients who are smoker to the number of patients who have diabetes]
data %>% group_by(smoke, diabetes) %>%
mutate(ratio = sum(smoke == 'Yes') / sum(diabetes == 'Yes'))
# Groups: smoke, diabetes [3]
# smoke diabetes ratio
# <chr> <chr> <dbl>
#1 No No NaN
#2 No Yes 0
#3 No No NaN
#4 No Yes 0
#5 Yes Yes 1
#6 No No NaN
Here you can see that I cannot get the ratio same as R base
approach.
How can I solve this problem? Should I use case_when()
?
Thank you.
Upvotes: 1
Views: 244
Reputation: 887163
Instead of grouping by both columns, we get the count of 'Yes' in both with summarise
and across
, then return the 'ratio' of both columns
library(dplyr)
data %>%
summarise(across(everything(), ~ sum(. == 'Yes'))) %>%
mutate(ratio = diabetes/smoke)
Upvotes: 1