Reputation: 108
Here my dataframe:
df = read.csv(text = '"Date","Value","ID","WY"
1975-02-01,-1.16543693088,"Tweed",1975
1975-03-01,-1.05372283483,"Tweed",1975
1975-04-01,-1.06632370439,"Tweed",1975
1975-05-01,-1.18903485356,"Tweed",1975
1992-05-01,-1.04737467143,"Ouse",1992
1992-06-01,-1.4058281451,"Ouse",1992
1992-07-01,-1.13608647243,"Ouse",1992
1992-08-01,-0.802566581309,"Ouse",1992
1992-09-01,-0.551433852821,"Ouse",1992
1992-10-01,-0.625997598552,"Ouse",1993
1992-11-01,-0.483559758609,"Ouse",1993
1992-12-01,-0.792013395632,"Ouse",1993
1993-01-01,-0.754618121962,"Ouse",1993
1993-02-01,-1.2504282139,"Ouse",1993
1996-01-01,-0.945410385985,"Trent",1996
1996-02-01,-0.84249575782,"Trent",1996
1996-03-01,-1.10332425045,"Trent",1996
1996-04-01,-1.22634133042,"Trent",1996
1996-05-01,-1.2335181635,"Trent",1996
1996-06-01,-1.23451130358,"Trent",1996
1996-07-01,-1.25902677738,"Trent",1996
1996-08-01,-1.13068733413,"Trent",1996', header = TRUE)
I need to find the annual maximum value for each ID and WY group.
The following code do the trick very easily but its output only shows the year of each annual maximum whereas I am interested also in the relative month and day:
df_AMAX = aggregate(df$Value, by = list(df$WY, df$ID), max)
colnames(df_AMAX) = c('Date', 'ID', 'Value')
print(df_AMAX)
Date ID Value
1 1992 Ouse -0.5514339
2 1993 Ouse -0.4835598
3 1996 Trent -0.8424958
4 1975 Tweed -1.0537228
My output should be:
Date ID Value
1 1992-09-01 Ouse -0.5514339
2 1993-11-01 Ouse -0.4835598
3 1996-02-01 Trent -0.8424958
4 1975-03-01 Tweed -1.0537228
It should be a silly thing but please let me know if you have any suggestion. Thanks
Upvotes: 0
Views: 1135
Reputation: 10352
There is of course a dplyr
solution, too:
df %>%
group_by(WY, ID) %>%
summarise(
Value = max(Value),
Date = Date[which.max(Value)]) %>%
ungroup() %>%
select(ID:Date)
Upvotes: 0
Reputation: 269501
Use subset
with ave
. Note that the function passed to ave
returns a logical but ave
will coerce it to the class of Value
so we use !! to make it logical again. No packages are used.
mx_all <- function(x) if (length(x)) x == max(x)
subset(df, !!ave(Value, ID, WY, FUN = mx_all))
or
mx_first <- function(x) if (length(x)) seq_along(x) == which.max(x)
subset(df, !!ave(Value, ID, WY, FUN = mx_first))
These give the same answer for the sample input and will always give the same answer if there is a unique maximum in each group but if there are multiple maxima in a group then the first one gives all of them and the second gives the first.
Upvotes: 1