Reputation: 391
I'm trying to calculate the minimum values of a numeric column for each level of a factor, while keeping values of another factor in the resulting data frame.
# dummy data
dat <- data.frame(
code = c("HH11", "HH45", "JL03", "JL03", "JL03", "HH11"),
index = c("023434", "3377477", "3388595", "3377477", "1177777", "023434"),
value = c(24.1, 37.2, 78.9, 45.9, 20.0, 34.6)
)
The result I want is the minimum of value
for each level of code
, keeping index
in the resulting data frame.
# result I want:
# code value index
# 1 HH11 24.1 023434
# 2 HH45 37.2 3377477
# 3 JL03 20.0 1177777
# ddply attempt
library(plyr)
ddply(dat, ~ code, summarise, val = min(value))
# code val
# 1 HH11 24.1
# 2 HH45 37.2
# 3 JL03 20.0
# base R attempt
aggregate(value ~ code, dat, min)
# code value
# 1 HH11 24.1
# 2 HH45 37.2
# 3 JL03 20.0
Upvotes: 19
Views: 32888
Reputation: 391
Well, a few minutes more searching would have gotten me there... this answer seems to do the trick:
merge(dat, aggregate(value ~ code, dat, min))
Upvotes: 0
Reputation: 23574
With the dplyr
and data.table
packages, you can do the following. You can get an index for the row having the minimum value for each group. You can use that in slice()
if you use dplyr
. You can achieve the same subsetting using .SD
if you use data.table
.
library(dplyr)
library(data.table)
dat %>%
group_by(code) %>%
slice(which.min(value))
# code index value
# <fctr> <fctr> <dbl>
#1 HH11 023434 24.1
#2 HH45 3377477 37.2
#3 JL03 1177777 20.0
setDT(dat)[, .SD[which.min(value)], by = code]
# code index value
#1: HH11 023434 24.1
#2: HH45 3377477 37.2
#3: JL03 1177777 20.0
Upvotes: 1
Reputation: 525
If you had added in the index variable that would have done it.
library(plyr)
# ddply
ddply(dat, .(code,index), summarise, val = min(value))
# base R
aggregate(value ~ code + index, dat, min)
Upvotes: 0
Reputation: 93803
Just to show that there's always multiple ways to skin a cat:
Using ave
to get the indexes of the minimum rows in each group:
dat[which(ave(dat$value,dat$code,FUN=function(x) x==min(x))==1),]
# code index value
#1 HH11 023434 24.1
#2 HH45 3377477 37.2
#5 JL03 1177777 20.0
This method also has the potential benefit of returning multiple rows per code
group in the instance of multiple values being the minimum.
And another method using by
:
do.call(rbind,
by(dat, dat$code, function(x) cbind(x[1,c("code","index")],value=min(x$value)))
)
# code index value
# HH11 HH11 023434 24.1
# HH45 HH45 3377477 37.2
# JL03 JL03 3388595 20.0
Upvotes: 3
Reputation: 17189
You need to use merge
on result of aggregate
and original data.frame
merge(aggregate(value ~ code, dat, min), dat, by = c("code", "value"))
## code value index
## 1 HH11 24.1 023434
## 2 HH45 37.2 3377477
## 3 JL03 20.0 1177777
Upvotes: 20