Reputation: 143
I have two datatime columns that sometimes overlap wrongly and I need to correct them, to make continuous intervals. The correction must be performed on the EndSession (row (n-1)) This is a dummy example, from my original dataset.
datateste5 <- data.frame(id = 1:9,
devices = c("A","B","C","A","A","B","C","B","A"),
chanell = c("111","222","333","111","333","111","222","333","111"),
StartSession = c("2021-12-01 00:00:00","2021-12-01 04:15:00","2021-12-01 00:15:00","2021-12-01 01:55:00","2021-12-01 00:35:00","2021-12-01 05:00:00","2021-12-01 04:07:00","2021-12-01 06:15:00","2021-12-02 01:45:00"),
EndSession = c("2021-12-01 00:02:00","2021-12-01 05:15:00","2021-12-01 04:30:00","2021-12-01 02:05:00","2021-12-01 01:45:00","2021-12-01 06:30:00","2021-12-01 09:15:00","2021-12-01 07:10:00","2021-12-01 02:22:00"))
This is the result I intend to achieve.
id Device Chanell StartSession EndSession StartSessionN EndSessionN
1 A 111 2021-12-01 00:00:00 2021-12-01 00:02:00 2021-12-01 00:00:00 2021-12-01 00:02:00
5 A 333 2021-12-01 00:35:00 2021-12-01 01:45:00 2021-12-01 00:35:00 2021-12-01 01:55:00
4 A 111 2021-12-01 01:55:00 2021-12-01 02:05:00 2021-12-01 01:55:00 2021-12-01 02:05:00
9 A 111 2021-12-02 01:45:00 2021-12-01 02:22:00 2021-12-01 01:45:00 2021-12-01 02:22:00
2 B 222 2021-12-01 04:15:00 2021-12-01 05:15:00 2021-12-01 04:15:00 2021-12-01 05:00:00
6 B 111 2021-12-01 05:00:00 2021-12-01 06:30:00 2021-12-01 05:00:00 2021-12-01 06:15:00
8 B 333 2021-12-01 06:15:00 2021-12-01 07:10:00 2021-12-01 06:15:00 2021-12-01 07:10:00
3 C 333 2021-12-01 00:15:00 2021-12-01 04:30:00 2021-12-01 00:15:00 2021-12-01 04:07:00
7 C 222 2021-12-01 04:07:00 2021-12-01 09:15:00 2021-12-02 04:07:00 2021-12-01 09:15:00
I would like to use dplyr, since for me it is easier to understand. Still very "green" in R
I don't know how to refer to the previous line, but assuming that this is possible, the idea would be.
datateste5$StartSession <- as.POSIXct(datateste5$StartSession,format = "%Y-%m-%d %H:%M:%S")
datateste5$EndSession <- as.POSIXct(datateste5$EndSession,format = "%Y-%m-%d %H:%M:%S")
datateste6 <- datateste5 %>%
arrange(devices, StartSession) %>%
mutate (StartSessionN = StartSession)) %>%
mutate(EndSessionN = if_else (StartSession (row(n+1)) < EndSession (row(n)), StartSession (row(n+1)), EndSession))%>%
datateste6
I know it’s not correct, it was just to reference the rows
Upvotes: 1
Views: 400
Reputation: 887148
After arrange
ing the dataset by 'devices', 'StartSession', create the 'EndSessionN' by using case_when
i.e. if the next value of 'StartSession' (lead
) is less than 'EndSession' return the next value of 'StartSession' or else return EndSession
library(dplyr)
datateste5 %>%
arrange(devices, StartSession) %>%
mutate(StartSessionN = StartSession,
EndSessionN = case_when(lead(StartSession) < EndSession
~ lead(StartSession), TRUE ~ EndSession))
Upvotes: 2