Reputation: 347
I'm looking to summarize data similar to the ToothGrowth data in the datasets package.
The output I want looks like this:
supp len half one two
1 OJ 619.9 132.3 227.0 260.6
2 VC 508.9 79.8 167.7 261.4
That is the sum of lengths split by dose and supplement type. My colleague gets this output using R version 2.15.1 and plyr_1.7.1 using the following code.
library(datasets)
x <- ToothGrowth
test <- ddply(x,c("supp"),summarize,
len = sum(len,na.rm=TRUE),
half = sum(len[dose==0.5],na.rm=TRUE),
one = sum(len[dose==1],na.rm=TRUE),
two = sum(len[dose==2],na.rm=TRUE))
There are no NAs in the ToothGrowth data but there are in the real dataset.
I get the following output R version 3.0.0 and and plyr_1.8. I can provide full sessionInfo() for both if that would be useful.
supp len half one two
1 OJ 619.9 619.9 0 0
2 VC 508.9 508.9 0 0
This doesn't seem to bring up an error. In my data I only have three 'doses' but lots of 'supplement types'. Where there are no values in the half category it puts the whole sum into one, or two.
Is there a way in which I can do this that will produce a consistent output across versions types?
Thanks for your help.
Upvotes: 1
Views: 1147
Reputation: 173677
summarise
was updated to "mutate by default" so to speak. So in the last three variables, when you refer to len
, you are actually referring to the len
variable you just created, which is only a single value. Call it something else:
test <- ddply(x,c("supp"),summarize,
+ len1 = sum(len,na.rm=TRUE),
+ half = sum(len[dose==0.5],na.rm=TRUE),
+ one = sum(len[dose==1],na.rm=TRUE),
+ two = sum(len[dose==2],na.rm=TRUE))
> test
supp len1 half one two
1 OJ 619.9 132.3 227.0 260.6
2 VC 508.9 79.8 167.7 261.4
(I originally mistakenly called this a change in ddply
.) As for why, I suppose because it seemed like it would be convenient, and people requested the change. Here is a link to the issue raised and the subsequent patch.
Upvotes: 7