no_one
no_one

Reputation: 108

find max value in a data.frame by group and show its date as year-month-day

Here my dataframe:

df = read.csv(text = '"Date","Value","ID","WY"
                  1975-02-01,-1.16543693088,"Tweed",1975
                  1975-03-01,-1.05372283483,"Tweed",1975
                  1975-04-01,-1.06632370439,"Tweed",1975
                  1975-05-01,-1.18903485356,"Tweed",1975
                1992-05-01,-1.04737467143,"Ouse",1992
                1992-06-01,-1.4058281451,"Ouse",1992
                1992-07-01,-1.13608647243,"Ouse",1992
                1992-08-01,-0.802566581309,"Ouse",1992
                1992-09-01,-0.551433852821,"Ouse",1992
                1992-10-01,-0.625997598552,"Ouse",1993
                1992-11-01,-0.483559758609,"Ouse",1993
                1992-12-01,-0.792013395632,"Ouse",1993
                1993-01-01,-0.754618121962,"Ouse",1993
                1993-02-01,-1.2504282139,"Ouse",1993
                1996-01-01,-0.945410385985,"Trent",1996
                1996-02-01,-0.84249575782,"Trent",1996
                1996-03-01,-1.10332425045,"Trent",1996
                1996-04-01,-1.22634133042,"Trent",1996
                1996-05-01,-1.2335181635,"Trent",1996
                1996-06-01,-1.23451130358,"Trent",1996
                1996-07-01,-1.25902677738,"Trent",1996
                1996-08-01,-1.13068733413,"Trent",1996', header = TRUE)

I need to find the annual maximum value for each ID and WY group.

The following code do the trick very easily but its output only shows the year of each annual maximum whereas I am interested also in the relative month and day:

df_AMAX = aggregate(df$Value, by = list(df$WY, df$ID), max)
colnames(df_AMAX) = c('Date', 'ID', 'Value')
print(df_AMAX)

 Date    ID      Value
1 1992  Ouse -0.5514339
2 1993  Ouse -0.4835598
3 1996 Trent -0.8424958
4 1975 Tweed -1.0537228

My output should be:

 Date           ID      Value
1 1992-09-01  Ouse -0.5514339
2 1993-11-01  Ouse -0.4835598
3 1996-02-01  Trent -0.8424958
4 1975-03-01  Tweed -1.0537228

It should be a silly thing but please let me know if you have any suggestion. Thanks

Upvotes: 0

Views: 1135

Answers (2)

J_F
J_F

Reputation: 10352

There is of course a dplyr solution, too:

df %>% 
  group_by(WY, ID) %>% 
    summarise(
      Value = max(Value),
      Date = Date[which.max(Value)]) %>% 
  ungroup() %>% 
   select(ID:Date)

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 269501

Use subset with ave. Note that the function passed to ave returns a logical but ave will coerce it to the class of Value so we use !! to make it logical again. No packages are used.

mx_all <- function(x) if (length(x)) x == max(x)
subset(df, !!ave(Value, ID, WY, FUN = mx_all))

or

mx_first <- function(x) if (length(x)) seq_along(x) == which.max(x)
subset(df, !!ave(Value, ID, WY, FUN = mx_first))

These give the same answer for the sample input and will always give the same answer if there is a unique maximum in each group but if there are multiple maxima in a group then the first one gives all of them and the second gives the first.

Upvotes: 1

Related Questions