Doing counts for a column of a dataframe in R

Question

I have a dataframe "samp" with a column (let's call it "rating") which takes on several values (let's say one of the following: "good", "medium", "bad".)

I would like to group-by on several other columns and count the frequency of "good", "medium" and "bad" and report those frequencies in new columns. (So maybe col1 is movie year, col2 is genre, and then there should be three more columns telling you how many of each type of rating there were for each year and genre.)

 ddply(samp,c("col1","col2"), summarize, 
       good=table(samp$rating)["good"],
       medium=table(samp$rating)["medium"],
       bad=table(samp$rating)["bad"])

The problem is (I think) that the functions I'm defining are not in terms of the groups ddply is outputting, they are just constant functions of samp. How can I define the functions here so that they're functions of the groups?

I tried using an anonymous function:

 ddply(samp,c("col1","col2"), summarize, 
       good=function(df)table(df$rating)["good"],
       medium=function(df)table(df$rating)["medium"],
       bad=function(df)table(df$rating)["bad"])

I can never get it working though. I think the error I've gotten the most from this is

 Error in output[[var]][rng] <- df[[var]] : 
 incompatible types (from closure to logical) in subassignment type fix

So lay it on me. What's the ridiculously simple solution that did not turn up while I blundered around trying 948506 combinations of ddply and table? Thank you.

Sven Hohenstein · Accepted Answer

Just remove all instances of samp$ inside ddply and it will work:

ddply(samp,c("col1","col2"), summarize, 
  good=table(rating)["good"],
  medium=table(rating)["medium"],
  bad=table(rating)["bad"])

Doing counts for a column of a dataframe in R

Answers (2)

Related Questions