Reputation: 431
I'm trying to calculate the proportion (percent) of categories in each column of a dataset.
Example data:
df <- data.frame(
"Size" = c("Y","N","N","Y","Y"),
"Type" = c("N","N","N","Y","N"),
"Age" = c("N","Y","N","Y","N"),
"Sex"=c("N","N","N","N","N")
)
df
Data produces a table like this:
Size Type Age Sex
1 Y N N N
2 N N Y N
3 N N N N
4 Y Y Y N
5 Y N N N
I've tried using prop.table to calculate proportion for one category:
prop.table(table(df$Size))
This works, but only calculates the percent of Y or N answers for one column. This dataset is quite large, so I'd like to calculate the proportion for each category at once.
My goal is to have a table that shows the proportion of "yes" answers for each column.
Like this:
Proportion Y
Size 0.60
Type 0.20
Age 0.40
Sex 0.00
I am relatively new to R, so any help would be appreciated!
Upvotes: 2
Views: 1040
Reputation: 46978
A dplyr approach:
library(dplyr)
df %>% summarise_all(~mean(.=="Y"))
If you have more than one group:
df1 = data.frame(class="A",df)
df2 = data.frame(class="B",df)
#make df2 different
df2$Size<- rep("Y",5)
newdf = rbind(df1,df2)
newdf %>% group_by(class) %>% summarise_all(~mean(.=="Y"))
Upvotes: 2
Reputation: 389315
One way in base R would be to use apply
column-wise on a logical vector
apply(df == "Y", 2, mean)
#Size Type Age Sex
# 0.6 0.2 0.4 0.0
A simpler version with colSums
.
colMeans(df == "Y")
Upvotes: 3