José Bustos
José Bustos

Reputation: 159

Identifying case by groups in a data.frame

I would like to identify the case by groups that is just bigger that avg plus sd. For example, using species as group and petal.wid as my variable in the iris data.

What's the better way to doit? creating a function?

I made this, but I can not make a relation to to orginal data for identifiying the case.

data(iris)
library(plyr)
petal.wid.avg <- ddply(iris, .(Species), function(df)
  return(c(petal.wid.avg=mean(df$Petal.Width), petal.wid.sd=sd(df$Petal.Width)))
)
petal.wid.avg$avgsd <- petal.wid.avg$petal.wid.avg + petal.wid.avg$petal.wid.sd
petal.wid.avg

Upvotes: 1

Views: 95

Answers (1)

Richie Cotton
Richie Cotton

Reputation: 121127

There are many ways of doing this, but the ave function is perhaps the easiest.

iris$big <- with(iris, 
  ave(Petal.Width, Species, FUN = function(x) x > mean(x) + sd(x))
)

Here's the plyr solution:

iris <- ddply(
  datasets::iris, 
  .(Species), 
  transform, 
  big = Petal.Width > mean(Petal.Width) + sd(Petal.Width) 
)

Baed on the comments, here's the rest of the solution.

iris <- subset(iris, big)
iris <- ddply(
  iris,
  .(Species),
  transform,
  smallest = Petal.Width == min(Petal.Width)
)
(iris <- subset(iris, smallest))

Note that where you have ties (as in this dataset), you won't get a unique "just bigger" row.

Upvotes: 4

Related Questions