Joezerz
Joezerz

Reputation: 151

Aggregate, but function uses two columns

I'm sure this has been asked 1000 times, but I can't find the question, and can't figure it out.

I have a data.frame, with a location (a factor), a date, and a variable.

I want to find the date on which the variable is maximized, for each location.

df = data.frame(FAC = factor(rep(c("A","B","C"),each=5)), VAR = runif(15), DATE = rep(as.Date(c("2000-01-01","2000-01-02","2000-01-03","2000-01-04","2000-01-05"))))

I can easily (but messily) do this with a for loop:

df_summary = data.frame(FAC = levels(df$FAC),date=as.Date(character(1)))
for(i in seq_along(levels(df$FAC))){
  df_subset = subset(df,FAC == levels(df$FAC)[i])
  max_date = df_subset$DATE[which.max(df_subset$VAR)]
  df_summary$date[df_summary$FAC == levels(df$FAC)[i]] = max_date
}

But I imagine there's a 'nice' way either with aggregate or dplyr, but I can't figure it out.

My (failed) attempts:

aggregate(x=df$DATE,by=list(df$FAC),FUN=function(x) x[which.max(df$VAR)])

This doesn't work, because df$VAR isn't subset in the function.

And I don't really know how to use dplyr because I generally use base R.

Any suggestions?

Upvotes: 0

Views: 32

Answers (2)

akrun
akrun

Reputation: 887851

We can use

library(dplyr)
df %>%
    arrange(FAC, desc(VAR)) %>%
    group_by(FAC) %>%
    slice(1)

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389235

In dplyr, you can do -

library(dplyr)

df %>% group_by(FAC) %>% summarise(max_date = DATE[which.max(VAR)])

In data.table -

library(data.table)
setDT(df)[, .(max_date = DATE[which.max(VAR)]), FAC]

Upvotes: 1

Related Questions