Reputation: 173
ID<-c("R1","R2","R2","R3","R3","R4","R4","R4","R4","R3","R3","R3","R3","R2","R2","R2","R5","R6")
event<-c("a","b","b","M","s","f","y","b","a","a","a","a","s","c","c","b","m","a")
df<-data.frame(ID,event)
How can I modify the below code to get this table. 2-How can i get the average of frequency for each element of frequency?for example: the average of frequency for a would be 1+3+1+1/4.
ddply(df,.(ID),summarise,N=sum(!is.na(ID)),frequency=length(event))
ID N Number-event-level levels frequency
R1 1 1 a a=1
R2 5 2 b,c b=3,c=2
R3 6 3 M,a,s M=1,a=3,s=2
R4 4 4 f,y,b,a f=1,y=1,b=1,a=1
R5 1 1 m m=1
R6 1 1 a a=1
Upvotes: 4
Views: 937
Reputation: 5249
Here's an answer for the first question:
ddply(df,.(ID),summarise,
N=length(event),
Number.event.level=length(unique(event)),
levels=paste(sort(unique(event)),collapse=","),
frequency=paste(paste(sort(unique(event)),table(event)[table(event)>0],sep="="),collapse=","))
# ID N Number.event.level levels frequency
# 1 R1 1 1 a a=1
# 2 R2 5 2 b,c b=3,c=2
# 3 R3 6 3 a,M,s a=3,M=1,s=2
# 4 R4 4 4 a,b,f,y a=1,b=1,f=1,y=1
# 5 R5 1 1 m m=1
# 6 R6 1 1 a a=1
For your second question, it seems like you want to get the average frequency when the frequency is greater than 0. If that's the case, you can do this:
apply(table(df),2,function(x) mean(x[x>0]))
# a b c f m M s y
# 1.5 2.0 2.0 1.0 1.0 1.0 2.0 1.0
Update
If you want to do that last part for each level of a third variable and you still want to use ddply()
you could do the following:
df1 <- rbind(df,df)
df1$cat <- rep(c("a","b"),each=nrow(df))
ddply(df1,.(cat),function(y) apply(table(y),2,function(x) mean(x[x>0])))
# cat a b c f m M s y
# 1 a 1.5 2 2 1 1 1 2 1
# 2 b 1.5 2 2 1 1 1 2 1
Upvotes: 3