Reputation: 1111
I have fairly large data frame that includes information on individuals divided into treatment groups. I am trying to generate variable means and gender percentages per group. I was able to calculate the means but I am not sure how to get the gender percentages.
Below, I generated a small replica of what my data looks like:
library(plyr)
#create variables and data frame
sampleid<-seq(1:100)
gender = rep(c("female","male"),c(50,50))
score <- rnorm(100)
age<-sample(25:35,100,replace=TRUE)
treatment <- rep(seq(1:5), each=4)
d <- data.frame(sampleid,gender,age,score, treatment)
>head(d)
sampleid gender age score treatment
1 1 female 34 1.6917201 1
2 2 female 26 -1.6189545 1
3 3 female 28 1.2867895 1
4 4 female 34 -0.5027578 1
5 5 female 29 -1.3652895 2
6 6 female 26 -2.4430843 2
I obtain the mean of each numeric column by:
groupstat<-ddply(d, .(treatment),numcolwise(mean))
which gives:
treatment sampleid age score
1 1 42.5 29.15 0.142078574
2 2 46.5 29.50 -0.261492514
3 3 50.5 30.50 -0.188393235
4 4 54.5 30.45 0.003526078
5 5 58.5 30.55 0.062996737
However I also need an additional column "Percent Female", which should give me the percentage of females within each treatment group 1:5. Can someone help me in how to add this?
Upvotes: 0
Views: 7683
Reputation: 1471
I would first split into treatment groups (split(d, f = d$treatment)
) and than calc the means for each group (function(x) sum(x$gender == "female")/length(x$gender)
:
sapply(split(d, f = d$treatment), function(x) sum(x$gender == "female")/length(x$gender))
Upvotes: 1
Reputation: 17611
Try this out
groupstat<-ddply(d, .(treatment),summarise,
meansc= mean(score),
meanage= mean(age),
meanID= mean(sampleid),
nfem= length(gender[gender=="female"]), # number females per treatment group
nmale= length(gender[gender=="male"]), # number of males per treatment group
percentfem= nfem/(nfem+nmale)) # percent females by treatment group
Upvotes: 4