How can I identify and summarize sets of data from matching groups in a dataframe?

Question

Here is an example dataframe:

set.seed(0)
x1 <- c(1, 1, 1, 1, 1, 2, 2, 2, 2)
x2 <- c(1, 1, 0, 0, 0, 1, 1, 1, 1)
x3 <- c(1, 1, 2, 2, 4, 1, 1, 2, 1)
n  <- c(1, 1, 1, 5, 5, 1, 1, 1, 1)
y <- rnorm(9)

mydf <- data.frame(x1, x2, x3, n, y)

What I would like to do is

identify rows with n=1 and which share identical values of (x1, x2, x3)
return a single row for each subset with y = mean(y) and n = length(y)
keep other rows the same.

for example, the new dataframe would be

x1 <- c(1,            1,    1,    1,    2,                 2)
x2 <- c(1,            0,    0,    0,    1,                 1)
x3 <- c(1,            2,    2,    4,    1,                 2)
n  <- c(2,            1,    5,    5,    3,                 1)
y  <- c(mean(y[1:2]), y[3], y[4], y[5], mean(y[c(6:7,9)]), y[8])

newdf <- data.frame(x1, x2, x3, n, y)

I can figure this out with conditionals and loops, but I would prefer to learn more elegant way to do this.

Brian Diggs · Accepted Answer

By "identical values in other columns", I take it you mean that each subset is defined by the same value of x1 in each of the rows of the subset, not that x1 is equal to x2. Thanks for the example to see what you meant.

library("plyr")

To get parts one and two

ddply(mydf[mydf$n==1,], .(x1, x2, x3), summarise, n = length(y), y = mean(y))

This can be rbind-ed with the part of mydf where n!=1 to get what you said

rbind(
  ddply(mydf[mydf$n==1,], .(x1, x2, x3), summarise, n = length(y), y = mean(y)),
  mydf[mydf$n!=1,]
)

This doesn't have the same order as you listed. If that is really important, you can add some auxiliary sorting variables.

mydf$order = seq(length=nrow(mydf))
newdf <- rbind(
  ddply(mydf[mydf$n==1,], .(x1, x2, x3), summarise, 
    n = length(y), y = mean(y), order=min(order)),
  mydf[mydf$n!=1,]
)
newdf <- newdf[order(newdf$order),]
newdf$order <- NULL

How can I identify and summarize sets of data from matching groups in a dataframe?

Answers (1)

Related Questions