Reputation: 159
I would like to identify the case by groups that is just bigger that avg plus sd. For example, using species as group and petal.wid as my variable in the iris data.
What's the better way to doit? creating a function?
I made this, but I can not make a relation to to orginal data for identifiying the case.
data(iris)
library(plyr)
petal.wid.avg <- ddply(iris, .(Species), function(df)
return(c(petal.wid.avg=mean(df$Petal.Width), petal.wid.sd=sd(df$Petal.Width)))
)
petal.wid.avg$avgsd <- petal.wid.avg$petal.wid.avg + petal.wid.avg$petal.wid.sd
petal.wid.avg
Upvotes: 1
Views: 95
Reputation: 121127
There are many ways of doing this, but the ave
function is perhaps the easiest.
iris$big <- with(iris,
ave(Petal.Width, Species, FUN = function(x) x > mean(x) + sd(x))
)
Here's the plyr
solution:
iris <- ddply(
datasets::iris,
.(Species),
transform,
big = Petal.Width > mean(Petal.Width) + sd(Petal.Width)
)
Baed on the comments, here's the rest of the solution.
iris <- subset(iris, big)
iris <- ddply(
iris,
.(Species),
transform,
smallest = Petal.Width == min(Petal.Width)
)
(iris <- subset(iris, smallest))
Note that where you have ties (as in this dataset), you won't get a unique "just bigger" row.
Upvotes: 4