DJack
DJack

Reputation: 4940

Summarize data frame based on condition

I have this kind of dataset (ID, V1, V2 are the 3 variables of my data frame):

ID V1 V2 
1  A  10
1  B  5
1  D  1
2  C  9
2  E  8

I would like a new data frame with, for each ID, the line that has the value max in V2. For the example, the result would be:

ID V1 V2 
1  A  10
2  C  9

Upvotes: 1

Views: 166

Answers (2)

Metrics
Metrics

Reputation: 15458

Use ddply from plyr package (assume data is sample)

    library(plyr)
    ddply(sample,.(ID),summarize,V1=V1[which.max(V2)],V2=max(V2))

  ID V1 V2
1  1  A 10
2  2  C  9

Upvotes: 2

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193547

This is sort of clumsy code, but it works....

> mydf[with(mydf, ave(V2, ID, FUN = function(x) x == max(x))) == 1, ]
  ID V1 V2
1  1  A 10
4  2  C  9

Less clumsy:

do.call(rbind, 
        by(mydf, mydf$ID, 
           FUN = function(x) x[which.max(x$V2), ]))
#   ID V1 V2
# 1  1  A 10
# 2  2  C  9

Upvotes: 1

Related Questions