RitaM
RitaM

Reputation: 143

dplyr: Mutate, with conditional if applied to "the row before"

I have two datatime columns that sometimes overlap wrongly and I need to correct them, to make continuous intervals. The correction must be performed on the EndSession (row (n-1)) This is a dummy example, from my original dataset.

datateste5 <- data.frame(id = 1:9,
                    devices = c("A","B","C","A","A","B","C","B","A"),
                    chanell = c("111","222","333","111","333","111","222","333","111"),
                    StartSession = c("2021-12-01  00:00:00","2021-12-01  04:15:00","2021-12-01  00:15:00","2021-12-01  01:55:00","2021-12-01  00:35:00","2021-12-01  05:00:00","2021-12-01  04:07:00","2021-12-01  06:15:00","2021-12-02  01:45:00"),
                    EndSession = c("2021-12-01  00:02:00","2021-12-01  05:15:00","2021-12-01  04:30:00","2021-12-01  02:05:00","2021-12-01  01:45:00","2021-12-01  06:30:00","2021-12-01  09:15:00","2021-12-01  07:10:00","2021-12-01  02:22:00"))

This is the result I intend to achieve.

id Device  Chanell      StartSession           EndSession              StartSessionN             EndSessionN   
1    A       111    2021-12-01 00:00:00     2021-12-01 00:02:00    2021-12-01 00:00:00      2021-12-01 00:02:00
5    A       333    2021-12-01 00:35:00     2021-12-01 01:45:00    2021-12-01 00:35:00      2021-12-01 01:55:00
4    A       111    2021-12-01 01:55:00     2021-12-01 02:05:00    2021-12-01 01:55:00      2021-12-01 02:05:00
9    A       111    2021-12-02 01:45:00     2021-12-01 02:22:00    2021-12-01 01:45:00      2021-12-01 02:22:00
2    B       222    2021-12-01 04:15:00     2021-12-01 05:15:00    2021-12-01 04:15:00      2021-12-01 05:00:00
6    B       111    2021-12-01 05:00:00     2021-12-01 06:30:00    2021-12-01 05:00:00      2021-12-01 06:15:00
8    B       333    2021-12-01 06:15:00     2021-12-01 07:10:00    2021-12-01 06:15:00      2021-12-01 07:10:00
3    C       333    2021-12-01 00:15:00     2021-12-01 04:30:00    2021-12-01 00:15:00      2021-12-01 04:07:00
7    C       222    2021-12-01 04:07:00     2021-12-01 09:15:00    2021-12-02 04:07:00      2021-12-01 09:15:00

I would like to use dplyr, since for me it is easier to understand. Still very "green" in R

I don't know how to refer to the previous line, but assuming that this is possible, the idea would be.

    datateste5$StartSession <- as.POSIXct(datateste5$StartSession,format = "%Y-%m-%d %H:%M:%S")
    datateste5$EndSession <- as.POSIXct(datateste5$EndSession,format = "%Y-%m-%d %H:%M:%S")

datateste6 <- datateste5 %>%
  arrange(devices, StartSession) %>%
  mutate (StartSessionN = StartSession)) %>%
  mutate(EndSessionN = if_else (StartSession (row(n+1)) < EndSession (row(n)), StartSession (row(n+1)), EndSession))%>% 
datateste6 

I know it’s not correct, it was just to reference the rows

Upvotes: 1

Views: 400

Answers (1)

akrun
akrun

Reputation: 887148

After arrangeing the dataset by 'devices', 'StartSession', create the 'EndSessionN' by using case_when i.e. if the next value of 'StartSession' (lead) is less than 'EndSession' return the next value of 'StartSession' or else return EndSession

library(dplyr)
datateste5 %>% 
  arrange(devices, StartSession) %>% 
  mutate(StartSessionN = StartSession,
         EndSessionN = case_when(lead(StartSession) < EndSession
            ~ lead(StartSession), TRUE ~ EndSession))

Upvotes: 2

Related Questions