Reputation: 1688
I have a data.frame with task assignments from a ticket tracking system.
Assignments <- data.frame('Task'=c(1, 1, 2, 3, 2, 2, 1), 'Assignee'=c('Alice', 'Bob', 'Alice', 'Alice', 'Bob', 'Chuck', 'Alice'))
I need to summarize the data for some monthly reports. Here is what I have so far:
ddply(Assignments, 'Task',
summarize,
Assignee.Count=length(Assignee),
Unique.Assignees.Involved=length(unique(Assignee)),
Assignees.Involved=paste(Assignee, sep=", ", collapse=", "))
And that nets me:
Task Assignee.Count Unique.Assignees.Involved Assignees.Involved
1 1 3 2 Alice, Bob, Alice
2 2 3 3 Alice, Bob, Chuck
3 3 1 1 Alice
In the Assignees.Involved column, I'd like to further summarize the data. In line 1, I'd like it to say "Alice 2, Bob 1". It feels to me like I need to use some other plyr
method to take the Assignees for each task, sort them, then run them through the rle
function, and paste the lengths and values back together. I can't figure out how to do that within the summarize function.
Here is the result for the whole entire data.frame:
paste(rle(as.vector(sort(Assignments$Assignee)))$values,
rle(as.vector(sort(Assignments$Assignee)))$lengths,
sep=" ", collapse=", ")
Results:
[1] "Alice 4, Bob 2, Chuck 1"
Upvotes: 1
Views: 1737
Reputation: 1688
I figured this out while posting the question :)
The trick is that within the functions specified as arguments to the summarize
function, you refer to them as a bareword; Assignments$Assignee
should be called just Assignee
, no data frame, no quotes, etc.
So once I had figured out that the rle
function could get me where I needed to be, I had what I needed.
ddply(Assignments, 'Task',
summarize,
Assignee.Count=length(Assignee),
Unique.Assignees.Involved=length(unique(Assignee)),
Assignments=paste(rle(as.vector(sort(Assignee)))$values,
rle(as.vector(sort(Assignee)))$lengths,
sep=" ", collapse=", "))
Gives:
Task Assignee.Count Unique.Assignees.Involved Assignments
1 1 3 2 Alice 2, Bob 1
2 2 3 3 Alice 1, Bob 1, Chuck 1
3 3 1 1 Alice 1
Upvotes: 1