Reputation: 29
This is the example data I have:
t<- data.frame(id=c(1,1,2,2,3,3), measureX =c(1,2,1,3,1,1), date=c('2021-1-1','2021-1-2','2021-1-3','2021-1-4','2021-1-5','2021-1-6'))
t$date <- as.Date(t$date)
I want to identify the maximum date in each group with an extra condition on another variable while preserving the structure of the data. In my real data problem, the condition could be varied, so I can not just subset the dataset to solve the problem. Plus, I need the identified maximum date to construct other functions, so filtering data is not a good option even for this example, it works. This is the code I tried, but got an error '& not defined for Date objects'
t%>%group_by(id) %>% max(t$measureX==1 & t$date)
Any idea how to achieve this? Thank you!
Upvotes: 1
Views: 1244
Reputation: 648
First, max()
takes a vector, not a logical statement as its argument. So we'll use filter()
to do our logic.
Second, t
is a data.frame
so you need to use max
inside the summarize()
(or summarise()
, depending on spelling preferences) command.
t<- data.frame(id=c(1,1,2,2,3,3), measureX =c(1,2,1,3,1,1), date=c('2021-1-1','2021-1-2','2021-1-3','2021-1-4','2021-1-5','2021-1-6'))
t$date <- as.Date(t$date)
t %>%
group_by(id) %>%
filter(measureX == 1) %>% #condition on another variable
summarize(max(date))
(When using pipes, you don't need the t$
prefixes.)
Update - if we need to identify the maximum date within a group based on a condition and then use that variable later, we can use this code.
t %>%
group_by(id) %>%
mutate(max_date_given_condition = max(date[measureX == 1])) %>%
ungroup() #to be safe
Upvotes: 2
Reputation: 89
library(lubridate)
library(dplyr)
t <- data.frame(id = c(1,1,2,2,3,3),
measureX = c(1,2,1,3,1,1),
date = c('2021-1-1','2021-1-2','2021-1-3','2021-1-4','2021-1-5','2021-1-6'))
t %>%
mutate(date = ymd(date)) %>%
group_by(id) %>%
summarize(max(date))
Upvotes: 0