Reputation: 9803
So I have a data frame, say with following data:
Count Amount Org Bank
------------------------------------------
1 100 ABC Chase
15 76 DEF American Express
...
...
When I run the ddply
using:
result1 <- ddply(df, 4, count = sum(as.numeric(df[[1]])), amt = sum(as.numeric(df[[2]])))
I get the result with result1
having the same value (i.e. count
and amt
) for all rows i.e.
description count amt
Chase 900 432087
American Express 900 432087
.....
which is definitely not the case. Somehow, it seems like the last sum()
value being calculated is applied to all the rows. Am I missing something here?
Upvotes: 0
Views: 1298
Reputation: 42942
There are a few problems here:
You are gettting the same/wrong result because you are referring back to the original dataframe df
in the arguments to ddply - e.g. df[[1]]
.
Ddply doesn't work like that - use column names directly, e.g. Amount
and Count
.
You are missing the .fun
function argument to ddply - in this case summarize
is appropriate.
(I honestly don't know how your code worked at all without this.)
You are using an undocumented way (4
) to select group columns in the .variable
argument. Try .(Bank)
or c("Bank")
instead.
This should work:
ddply(df, .(Bank), summarize, count = sum(as.numeric(Count)),
amt = sum(as.numeric(Amount)))
Upvotes: 7