Reputation: 786
I have a dataframe "samp" with a column (let's call it "rating") which takes on several values (let's say one of the following: "good", "medium", "bad".)
I would like to group-by on several other columns and count the frequency of "good", "medium" and "bad" and report those frequencies in new columns. (So maybe col1 is movie year, col2 is genre, and then there should be three more columns telling you how many of each type of rating there were for each year and genre.)
ddply(samp,c("col1","col2"), summarize,
good=table(samp$rating)["good"],
medium=table(samp$rating)["medium"],
bad=table(samp$rating)["bad"])
The problem is (I think) that the functions I'm defining are not in terms of the groups ddply is outputting, they are just constant functions of samp. How can I define the functions here so that they're functions of the groups?
I tried using an anonymous function:
ddply(samp,c("col1","col2"), summarize,
good=function(df)table(df$rating)["good"],
medium=function(df)table(df$rating)["medium"],
bad=function(df)table(df$rating)["bad"])
I can never get it working though. I think the error I've gotten the most from this is
Error in output[[var]][rng] <- df[[var]] :
incompatible types (from closure to logical) in subassignment type fix
So lay it on me. What's the ridiculously simple solution that did not turn up while I blundered around trying 948506 combinations of ddply and table? Thank you.
Upvotes: 0
Views: 393
Reputation: 4941
Generic data:
samp <- data.frame(rating=c("bad","medium","good","bad","medium","good"),
col1=c(2007,2010,2007,2009,2010,2010),
col2=c("fiction","fiction","fiction","drama","drama","drama"))
Code (you shouldn't use samp$
before columns' names):
ddply(samp,c("col1","col2"), summarize,
good=sum(rating == "good"),
medium=sum(rating == "medium"),
bad=sum(rating == "bad"))
Output:
col1 col2 good medium bad
1 2007 fiction 1 0 1
2 2009 drama 0 0 1
3 2010 drama 1 1 0
4 2010 fiction 0 1 0
Upvotes: 1
Reputation: 81693
Just remove all instances of samp$
inside ddply
and it will work:
ddply(samp,c("col1","col2"), summarize,
good=table(rating)["good"],
medium=table(rating)["medium"],
bad=table(rating)["bad"])
Upvotes: 2