Jie
Jie

Reputation: 29

Find maximum date in each group with an extra condition on a different variables. get an error Max()& not defined for Date objects

This is the example data I have:

t<- data.frame(id=c(1,1,2,2,3,3), measureX =c(1,2,1,3,1,1), date=c('2021-1-1','2021-1-2','2021-1-3','2021-1-4','2021-1-5','2021-1-6'))
t$date <- as.Date(t$date)

I want to identify the maximum date in each group with an extra condition on another variable while preserving the structure of the data. In my real data problem, the condition could be varied, so I can not just subset the dataset to solve the problem. Plus, I need the identified maximum date to construct other functions, so filtering data is not a good option even for this example, it works. This is the code I tried, but got an error '& not defined for Date objects'

t%>%group_by(id) %>% max(t$measureX==1 & t$date)

Any idea how to achieve this? Thank you!

Upvotes: 1

Views: 1244

Answers (2)

dyrland
dyrland

Reputation: 648

First, max() takes a vector, not a logical statement as its argument. So we'll use filter() to do our logic.

Second, t is a data.frame so you need to use max inside the summarize() (or summarise(), depending on spelling preferences) command.

t<- data.frame(id=c(1,1,2,2,3,3), measureX =c(1,2,1,3,1,1), date=c('2021-1-1','2021-1-2','2021-1-3','2021-1-4','2021-1-5','2021-1-6'))
t$date <- as.Date(t$date)


t %>%
  group_by(id) %>%
  filter(measureX == 1) %>% #condition on another variable
  summarize(max(date))

(When using pipes, you don't need the t$ prefixes.)

Update - if we need to identify the maximum date within a group based on a condition and then use that variable later, we can use this code.

t %>%
  group_by(id) %>%
  mutate(max_date_given_condition = max(date[measureX == 1])) %>% 
  ungroup() #to be safe

Upvotes: 2

bioinformatics2020
bioinformatics2020

Reputation: 89

library(lubridate)
library(dplyr)

t <- data.frame(id = c(1,1,2,2,3,3), 
               measureX = c(1,2,1,3,1,1), 
               date = c('2021-1-1','2021-1-2','2021-1-3','2021-1-4','2021-1-5','2021-1-6'))

t %>% 
  mutate(date = ymd(date)) %>%
  group_by(id) %>% 
  summarize(max(date))

Upvotes: 0

Related Questions