Sharath
Sharath

Reputation: 2267

Create a new column which is the max of datetime with conditions on other columns

I have a dataframe like this.

ID <- c("111","111","111","111",
        "113","113","113","113")
ToolID <- c("CCP_A","CCP_B","CCP_B","CCQ_A",
            "CCP_A","CCP_B","CCP_B","CCQ_A")
Step <- c("Step_A","Step_B","Step_B","Step_C",
          "Step_A","Step_B","Step_B","Step_C")
Datetime <- c("2018-04-19 12:44:11","2018-04-17 12:44:11","2018-04-17 12:44:11","2018-04-16 21:11:09",
              "2018-04-13 12:44:11","2018-04-11 12:44:11","2018-04-11 12:44:11","2018-04-10 21:11:09")

df <- data.frame(ID,ToolID,Step,Datetime) 
df$Datetime <- as.POSIXct(df$Datetime,format="%Y-%m-%d %H:%M:%S")

I am trying to create a new column "New.Datetime" that is the MAX(Datetime) grouped by ID only when ToolID == "CCP_B" & Step=="Step_B"

Here is my desired output

   ID ToolID   Step            Datetime        New.Datetime
  111  CCP_A Step_A 2018-04-19 12:44:11 2018-04-17 12:44:11
  111  CCP_B Step_B 2018-04-17 12:44:11 2018-04-17 12:44:11
  111  CCP_B Step_B 2018-04-17 12:44:11 2018-04-17 12:44:11
  111  CCQ_A Step_C 2018-04-16 21:11:09 2018-04-17 12:44:11
  113  CCP_A Step_A 2018-04-13 12:44:11 2018-04-11 12:44:11
  113  CCP_B Step_B 2018-04-11 12:44:11 2018-04-11 12:44:11
  113  CCP_B Step_B 2018-04-11 12:44:11 2018-04-11 12:44:11
  113  CCQ_A Step_C 2018-04-10 21:11:09 2018-04-11 12:44:11

I am trying to do it this way but the new.datetime seems messy with numeric values.

library(dplyr)
df1 <- df %>%
  group_by(ID) %>%
  mutate(New.Datetime = ifelse((ToolID == "CCP_B" & Step=="Step_B"), 
                               max(Datetime), Datetime))

Upvotes: 1

Views: 58

Answers (1)

akrun
akrun

Reputation: 887991

We can use case_when which will work as well

df %>% 
  group_by(ID) %>%
  mutate(New.DateTime =  case_when(ToolID == "CCP_B" & 
                            Step == "Step_B" ~ max(Datetime), 
                            TRUE ~ Datetime))
# A tibble: 8 x 5
# Groups:   ID [2]
#  ID    ToolID Step   Datetime            New.DateTime       
#  <fct> <fct>  <fct>  <dttm>              <dttm>             
#1 111   CCP_A  Step_A 2018-04-19 12:44:11 2018-04-19 12:44:11
#2 111   CCP_B  Step_B 2018-04-17 12:44:11 2018-04-19 12:44:11
#3 111   CCP_B  Step_B 2018-04-17 12:44:11 2018-04-19 12:44:11
#4 111   CCQ_A  Step_C 2018-04-16 21:11:09 2018-04-16 21:11:09
#5 113   CCP_A  Step_A 2018-04-13 12:44:11 2018-04-13 12:44:11
#6 113   CCP_B  Step_B 2018-04-11 12:44:11 2018-04-13 12:44:11
#7 113   CCP_B  Step_B 2018-04-11 12:44:11 2018-04-13 12:44:11
#8 113   CCQ_A  Step_C 2018-04-10 21:11:09 2018-04-10 21:11:09

As the OP clarified in the comments to have the max value based on the condition for ecah 'ID', we just neeed

df %>%
  group_by(ID) %>%
  mutate(New.DateTime = max(Datetime[ToolID == "CCP_B" & Step == "Step_B"]))
# A tibble: 8 x 5
# Groups:   ID [2]
#  ID    ToolID Step   Datetime            New.DateTime       
#  <fct> <fct>  <fct>  <dttm>              <dttm>             
#1 111   CCP_A  Step_A 2018-04-19 12:44:11 2018-04-17 12:44:11
#2 111   CCP_B  Step_B 2018-04-17 12:44:11 2018-04-17 12:44:11
#3 111   CCP_B  Step_B 2018-04-17 12:44:11 2018-04-17 12:44:11
#4 111   CCQ_A  Step_C 2018-04-16 21:11:09 2018-04-17 12:44:11
#5 113   CCP_A  Step_A 2018-04-13 12:44:11 2018-04-11 12:44:11
#6 113   CCP_B  Step_B 2018-04-11 12:44:11 2018-04-11 12:44:11
#7 113   CCP_B  Step_B 2018-04-11 12:44:11 2018-04-11 12:44:11
#8 113   CCQ_A  Step_C 2018-04-10 21:11:09 2018-04-11 12:44:11

Upvotes: 3

Related Questions