CadisEtRama
CadisEtRama

Reputation: 1111

calculate gender percentage from grouped data frame in R

I have fairly large data frame that includes information on individuals divided into treatment groups. I am trying to generate variable means and gender percentages per group. I was able to calculate the means but I am not sure how to get the gender percentages.

Below, I generated a small replica of what my data looks like:

library(plyr)
#create variables and data frame
sampleid<-seq(1:100)
gender = rep(c("female","male"),c(50,50))
score <- rnorm(100)
age<-sample(25:35,100,replace=TRUE)
treatment <- rep(seq(1:5), each=4)
d <- data.frame(sampleid,gender,age,score, treatment)

>head(d)

  sampleid gender age      score treatment
1        1 female  34  1.6917201         1
2        2 female  26 -1.6189545         1
3        3 female  28  1.2867895         1
4        4 female  34 -0.5027578         1
5        5 female  29 -1.3652895         2
6        6 female  26 -2.4430843         2

I obtain the mean of each numeric column by:

groupstat<-ddply(d, .(treatment),numcolwise(mean))

which gives:

  treatment sampleid   age        score
1         1     42.5 29.15  0.142078574
2         2     46.5 29.50 -0.261492514
3         3     50.5 30.50 -0.188393235
4         4     54.5 30.45  0.003526078
5         5     58.5 30.55  0.062996737

However I also need an additional column "Percent Female", which should give me the percentage of females within each treatment group 1:5. Can someone help me in how to add this?

Upvotes: 0

Views: 7683

Answers (2)

holzben
holzben

Reputation: 1471

I would first split into treatment groups (split(d, f = d$treatment)) and than calc the means for each group (function(x) sum(x$gender == "female")/length(x$gender):

sapply(split(d, f = d$treatment), function(x) sum(x$gender == "female")/length(x$gender))

Upvotes: 1

Jota
Jota

Reputation: 17611

Try this out

groupstat<-ddply(d, .(treatment),summarise,
                 meansc= mean(score),
                 meanage= mean(age),
                 meanID= mean(sampleid),
                 nfem= length(gender[gender=="female"]), # number females per treatment group
                 nmale= length(gender[gender=="male"]), # number of males per treatment group
                 percentfem= nfem/(nfem+nmale)) # percent females by treatment group

Upvotes: 4

Related Questions