Reputation: 3126
a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)
r<-sapply(split(a.3,a.2),function(x) which.max(x$b.2))
a.3[r,]
returns the list index, not the index for the entire data.frame
Im trying to return the largest value of b.2
for each subgroup of a.2
. How can I do this efficiently?
Upvotes: 9
Views: 8735
Reputation: 983
a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)
With aggregate
, you can get the maximum for each group in one line:
aggregate(a.3, by = list(a.3$a.2), FUN = max)
This produces the following output:
Group.1 a.2 b.2
1 1 1 96
2 2 2 82
...
8 8 8 85
9 9 9 93
10 10 10 97
Upvotes: 0
Reputation: 3835
The ddply
and ave
approaches are both fairly resource-intensive, I think. ave
fails by running out of memory for my current problem (67,608 rows, with four columns defining the unique keys). tapply
is a handy choice, but what I generally need to do is select all the whole rows with the something-est some-value for each unique key (usually defined by more than one column). The best solution I've found is to do a sort and then use negation of duplicated
to select only the first row for each unique key. For the simple example here:
a <- sample(1:10,100,replace=T)
b <- sample(1:100,100,replace=T)
f <- data.frame(a, b)
sorted <- f[order(f$a, -f$b),]
highs <- sorted[!duplicated(sorted$a),]
I think the performance gains over ave
or ddply
, at least, are substantial. It is slightly more complicated for multi-column keys, but order
will handle a whole bunch of things to sort on and duplicated
works on data frames, so it's possible to continue using this approach.
Upvotes: 10
Reputation: 23758
a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)
The answer by Jonathan Chang gets you what you explicitly asked for, but I'm guessing that you want the actual row from the data frame.
sel <- ave(b.2, a.2, FUN = max) == b.2
a.3[sel,]
Upvotes: 6
Reputation: 25337
> a.2<-sample(1:10,100,replace=T)
> b.2<-sample(1:100,100,replace=T)
> tapply(b.2, a.2, max)
1 2 3 4 5 6 7 8 9 10
99 92 96 97 98 99 94 98 98 96
Upvotes: 1
Reputation: 3126
a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)
m<-split(a.3,a.2)
u<-function(x){
a<-rownames(x)
b<-which.max(x[,2])
as.numeric(a[b])
}
r<-sapply(m,FUN=function(x) u(x))
a.3[r,]
This does the trick, albeit somewhat cumbersome...But it allows me to grab the rows for the groupwise largest values. Any other ideas?
Upvotes: 1