Reputation: 151
I'm sure this has been asked 1000 times, but I can't find the question, and can't figure it out.
I have a data.frame, with a location (a factor), a date, and a variable.
I want to find the date on which the variable is maximized, for each location.
df = data.frame(FAC = factor(rep(c("A","B","C"),each=5)), VAR = runif(15), DATE = rep(as.Date(c("2000-01-01","2000-01-02","2000-01-03","2000-01-04","2000-01-05"))))
I can easily (but messily) do this with a for
loop:
df_summary = data.frame(FAC = levels(df$FAC),date=as.Date(character(1)))
for(i in seq_along(levels(df$FAC))){
df_subset = subset(df,FAC == levels(df$FAC)[i])
max_date = df_subset$DATE[which.max(df_subset$VAR)]
df_summary$date[df_summary$FAC == levels(df$FAC)[i]] = max_date
}
But I imagine there's a 'nice' way either with aggregate
or dplyr
, but I can't figure it out.
My (failed) attempts:
aggregate(x=df$DATE,by=list(df$FAC),FUN=function(x) x[which.max(df$VAR)])
This doesn't work, because df$VAR
isn't subset in the function.
And I don't really know how to use dplyr
because I generally use base R.
Any suggestions?
Upvotes: 0
Views: 32
Reputation: 887851
We can use
library(dplyr)
df %>%
arrange(FAC, desc(VAR)) %>%
group_by(FAC) %>%
slice(1)
Upvotes: 0
Reputation: 389235
In dplyr
, you can do -
library(dplyr)
df %>% group_by(FAC) %>% summarise(max_date = DATE[which.max(VAR)])
In data.table
-
library(data.table)
setDT(df)[, .(max_date = DATE[which.max(VAR)]), FAC]
Upvotes: 1