Reputation: 117
Say I have a dataframe with two columns, such as
data.frame(experiment = rep(c('e1', 'e2'),each = 3),
outcomes = c('NH', 'NH', 'NH', 'H', 'NH', 'H'))
For each value in a column, I want to calculate the proportion of values that a particular value in a different column. So for my example, I want to calculate the proportion of outcomes in e1 and in e2 that are 'NH'. Thus, the final result is:
experiment | Proportion |
---|---|
e1 | 1 |
e2 | 0.333 |
Upvotes: 2
Views: 4230
Reputation: 101335
Another base R option using aggregate
> aggregate(cbind(Proportion = outcomes=="NH") ~ experiment,df,mean)
experiment Proportion
1 e1 1.0000000
2 e2 0.3333333
Upvotes: 2
Reputation: 887108
We could use a group by mean
on the logical vector
library(dplyr)
df1 %>%
group_by(experiment) %>%
summarise(Proportion = mean(outcomes == 'NH'))
# A tibble: 2 x 2
experiment Proportion
<chr> <dbl>
1 e1 1
2 e2 0.333
Or use table/proportions
in base R
proportions(table(df1), 1)[, 'NH', drop = FALSE]
outcomes
experiment NH
e1 1.0000000
e2 0.3333333
df1 <- structure(list(experiment = c("e1", "e1", "e1", "e2", "e2", "e2"
), outcomes = c("NH", "NH", "NH", "H", "NH", "H")), class = "data.frame",
row.names = c(NA,
-6L))
Upvotes: 2