Reputation: 391

Aggregate by factor levels, keeping other variables in the resulting data frame

I'm trying to calculate the minimum values of a numeric column for each level of a factor, while keeping values of another factor in the resulting data frame.

# dummy data
dat <- data.frame(
    code = c("HH11", "HH45", "JL03", "JL03", "JL03", "HH11"), 
    index = c("023434", "3377477", "3388595", "3377477", "1177777", "023434"), 
    value = c(24.1, 37.2, 78.9, 45.9, 20.0, 34.6)
    )

The result I want is the minimum of value for each level of code, keeping index in the resulting data frame.

# result I want:
#   code value    index
# 1 HH11  24.1   023434
# 2 HH45  37.2  3377477
# 3 JL03  20.0  1177777


# ddply attempt
library(plyr)
ddply(dat, ~ code, summarise, val = min(value))
#   code   val
# 1 HH11  24.1
# 2 HH45  37.2
# 3 JL03  20.0


# base R attempt
aggregate(value ~ code, dat, min)
#   code value
# 1 HH11  24.1
# 2 HH45  37.2
# 3 JL03  20.0

Upvotes: 19

Answers (5)

Chris

Reputation: 391

Well, a few minutes more searching would have gotten me there... this answer seems to do the trick:

merge(dat, aggregate(value ~ code, dat, min))

Upvotes: 0

jazzurro

Reputation: 23574

With the dplyr and data.table packages, you can do the following. You can get an index for the row having the minimum value for each group. You can use that in slice() if you use dplyr. You can achieve the same subsetting using .SD if you use data.table.

library(dplyr)
library(data.table)

dat %>%
group_by(code) %>%
slice(which.min(value))

#    code   index value
#  <fctr>  <fctr> <dbl>
#1   HH11  023434  24.1
#2   HH45 3377477  37.2
#3   JL03 1177777  20.0

setDT(dat)[, .SD[which.min(value)], by = code]

#   code   index value
#1: HH11  023434  24.1
#2: HH45 3377477  37.2
#3: JL03 1177777  20.0

Upvotes: 1

CArnold

Reputation: 525

If you had added in the index variable that would have done it.

library(plyr)

# ddply
ddply(dat, .(code,index), summarise, val = min(value))

# base R
aggregate(value ~ code + index, dat, min)

Upvotes: 0

thelatemail

Reputation: 93803

Just to show that there's always multiple ways to skin a cat:

Using ave to get the indexes of the minimum rows in each group:

dat[which(ave(dat$value,dat$code,FUN=function(x) x==min(x))==1),]

#  code   index value
#1 HH11  023434  24.1
#2 HH45 3377477  37.2
#5 JL03 1177777  20.0

This method also has the potential benefit of returning multiple rows per code group in the instance of multiple values being the minimum.

And another method using by:

do.call(rbind,
  by(dat, dat$code, function(x) cbind(x[1,c("code","index")],value=min(x$value)))
)
#      code   index value
# HH11 HH11  023434  24.1
# HH45 HH45 3377477  37.2
# JL03 JL03 3388595  20.0

Upvotes: 3

CHP

Reputation: 17189

You need to use merge on result of aggregate and original data.frame

merge(aggregate(value ~ code, dat, min), dat, by = c("code", "value"))
##   code value   index
## 1 HH11  24.1  023434
## 2 HH45  37.2 3377477
## 3 JL03  20.0 1177777

Upvotes: 20

Aggregate by factor levels, keeping other variables in the resulting data frame

Answers (5)

Related Questions