Reputation: 21
I have a dataframe that looks like this:
pred1 pred2 pred3 exp
a b c 0
a d c 0
a b c 1
What I would like to do is to first get all unique combinations of pred1-3, write them to an additional table, add a column for the frequency of each combination, and add another column that gives the proportion of the value 1 of exp (which can only be 0 or 1) for each combination. Something like this:
pred1 pred2 pred3 freq exp_prop
a b c 2 0.5
a d c 1 0
The first three steps turned out to be really easy with plyr:
ddply(df, .(pred1, pred2, pred3), summarise, freq=length(exp))
or shorter
count(df[,c(pred1, pred2, pred3)])
But I just can't figure out how to get the proportions of exp in.
Upvotes: 2
Views: 1546
Reputation: 81683
You are almost done. Just add exp_prop = mean(exp)
to the ddply
command:
ddply(df, .(pred1, pred2, pred3), summarise,
freq = length(exp), exp_prop = mean(exp))
pred1 pred2 pred3 freq exp_prop
1 a b c 2 0.5
2 a d c 1 0.0
Upvotes: 1
Reputation: 6094
# read in your data
x <-
read.table(text="pred1 pred2 pred3 exp
a b c 0
a d c 0
a b c 1" , h = T)
library(sqldf)
sqldf( "select pred1, pred2, pred3, count(*) as numtimes, avg( exp ) as prop from x group by pred1, pred2, pred3" )
###### alternative:
# write all the column names according to some pattern
cols <- paste0("pred" , 1:3 , collapse = "," )
# save your data frame to another object
y <-
sqldf(
paste(
"select" ,
cols ,
" , count(*) as numtimes, avg( exp ) as prop from x group by" ,
cols
)
)
# print to screen
y
Upvotes: 0