user1554977
user1554977

Reputation: 21

Get unique value combinations with summary count of another variable

I have a dataframe that looks like this:

pred1 pred2 pred3 exp
a     b     c     0
a     d     c     0
a     b     c     1

What I would like to do is to first get all unique combinations of pred1-3, write them to an additional table, add a column for the frequency of each combination, and add another column that gives the proportion of the value 1 of exp (which can only be 0 or 1) for each combination. Something like this:

pred1 pred2 pred3 freq exp_prop
a     b     c     2    0.5
a     d     c     1    0

The first three steps turned out to be really easy with plyr:

ddply(df, .(pred1, pred2, pred3), summarise, freq=length(exp))

or shorter

count(df[,c(pred1, pred2, pred3)])

But I just can't figure out how to get the proportions of exp in.

Upvotes: 2

Views: 1546

Answers (2)

Sven Hohenstein
Sven Hohenstein

Reputation: 81683

You are almost done. Just add exp_prop = mean(exp) to the ddply command:

ddply(df, .(pred1, pred2, pred3), summarise,
      freq = length(exp), exp_prop = mean(exp))

  pred1 pred2 pred3 freq exp_prop
1     a     b     c    2      0.5
2     a     d     c    1      0.0

Upvotes: 1

Anthony Damico
Anthony Damico

Reputation: 6094

# read in your data
x <- 
read.table(text="pred1 pred2 pred3 exp
a     b     c     0
a     d     c     0
a     b     c     1" , h = T)

library(sqldf)
sqldf( "select pred1, pred2, pred3, count(*) as numtimes, avg( exp ) as prop from x group by pred1, pred2, pred3" )

###### alternative:

# write all the column names according to some pattern
cols <- paste0("pred" , 1:3 , collapse = "," )

# save your data frame to another object
y <-
    sqldf( 
        paste( 
            "select" , 
            cols  , 
            " , count(*) as numtimes, avg( exp ) as prop from x group by" , 
            cols 
        ) 
    )

# print to screen
y

Upvotes: 0

Related Questions