Crazed
Crazed

Reputation: 117

Calculating proportion of values in a column based on different column in R

Say I have a dataframe with two columns, such as

data.frame(experiment = rep(c('e1', 'e2'),each = 3), 
           outcomes = c('NH', 'NH', 'NH', 'H', 'NH', 'H'))

For each value in a column, I want to calculate the proportion of values that a particular value in a different column. So for my example, I want to calculate the proportion of outcomes in e1 and in e2 that are 'NH'. Thus, the final result is:

experiment Proportion
e1 1
e2 0.333

Upvotes: 2

Views: 4230

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101335

Another base R option using aggregate

> aggregate(cbind(Proportion = outcomes=="NH") ~ experiment,df,mean)
  experiment Proportion
1         e1  1.0000000
2         e2  0.3333333

Upvotes: 2

akrun
akrun

Reputation: 887108

We could use a group by mean on the logical vector

library(dplyr)
df1 %>%
   group_by(experiment) %>%
   summarise(Proportion = mean(outcomes == 'NH'))
# A tibble: 2 x 2
  experiment Proportion
  <chr>           <dbl>
1 e1              1    
2 e2              0.333

Or use table/proportions in base R

 proportions(table(df1), 1)[, 'NH', drop = FALSE]
          outcomes
experiment        NH
        e1 1.0000000
        e2 0.3333333

data

df1 <- structure(list(experiment = c("e1", "e1", "e1", "e2", "e2", "e2"
), outcomes = c("NH", "NH", "NH", "H", "NH", "H")), class = "data.frame", 
row.names = c(NA, 
-6L))

Upvotes: 2

Related Questions