hk2
hk2

Reputation: 487

Strip the time part from data and calculate time difference between consecutive rows using R

I have an excel file which has a column 'Time' which represents the time. The data type of this column is POSIXct. When I load the excel file in R, some random date part gets appended to the time and so I want to delete this random dates appended, retain only the time part and then calculate the difference between consecutive rows based on grouping Emp_Id and Date columns where I need to see how much was the difference in the clock in and clock out time each day for each employee.

This is how my data looks like when loaded in R with random date getting added.

| Emp_Id |    Date   |                 Time | Time_Event |
|--------|:---------:|---------------------:|------------|
| 95     | 3/14/2019 | 1899-12-31 10:47:12  | Clock-In   |
| 95     | 3/12/2019 | 1899-12-31 10:51:12  | Clock-In   |
| 95     | 3/11/2019 | 1899-12-31 8:15:16   | Clock-Out  |
| 95     | 3/12/2019 | 1899-12-31 8:10:07   | Clock-Out  |
| 95     | 3/11/2019 | 1899-12-31 10:41:51  | Clock-In   |
| 19     | 3/14/2019 | 1899-12-31 6:02:23   | Clock-Out  |
| 19     | 3/18/2019 | 1899-12-31 5:44:23   | Clock-In   |
| 19     | 3/12/2019 | 1899-12-31 6:05:15   | Clock-Out  |
| 19     | 3/12/2019 | 1899-12-31 5:45:57   | Clock-In   |
| 19     | 3/14/2019 | 1899-12-31 5:29:32   | Clock-In   |

To make it easy, the data will be:

Emp_Id <- as.numeric(c("95", "95", "95", "95", "95", "19", "19", "19", "19", "19"))
Date <- c("3/14/2019","3/12/2019","3/11/2019", "3/12/2019","3/11/2019","3/14/2019","3/18/2019","3/12/2019","3/12/2019","3/14/2019")
Time <- as.POSIXct(c("1899-12-31 10:47:12", "1899-12-31 10:51:12", "1899-12-31 8:15:16","1899-12-31 8:10:07", "1899-12-31 10:41:51",
"1899-12-31 6:02:23", "1899-12-31 5:44:23", "1899-12-31 6:05:15", "1899-12-31 5:45:57","1899-12-31 5:29:32"))
Time_Event <- c("Clock-In","Clock-In","Clock-Out","Clock-Out","Clock-In","Clock-Out","Clock-In","Clock-Out","Clock-In","Clock-In")
df2 <- data.frame(Emp_Id,Date,Time,Time_Event, stringsAsFactors = F)
df2$Date= as.Date(df2$Date, format = "%m/%d/%Y")

Using df$Time <- format(strptime(df$Time, "%Y-%m-%d %H:%M:%S"), "%H:%M:%S") strips the time part but converts the data type to character. Since I need to calculate the difference, I cannot do this on character data type. I have been through this link How to calculate time difference in consecutive rows, but this doesn't help.

I have tried the below code but I get error because of the character data type

df2 <- df2 %>%
  arrange(df2$Emp_Id, df2$Date, df2$Time) %>% 
  group_by(df2$Emp_Id,df2$Date) %>%
  mutate(diff = format(strptime(df2$Time, "%Y-%m-%d %H:%M:%S"),"%H:%M:%S")- 
           lag(format(strptime(df2$Time, "%Y-%m-%d %H:%M:%S"),"%H:%M:%S"),
               default = format(strptime(df2$Time, "%Y-%m-%d %H:%M:%S"),"%H:%M:%S")[1]),
         diff_secs = as.numeric(diff, units = 'secs'))

How can I achieve the final output to look like:

| Emp_Id |    Date   |     Time | Time_Event | Diff(In seconds) |
|--------|:---------:|---------:|------------|------------------|
| 19     | 3/12/2019 |  5:45:57 | Clock-In   | NA               |
| 19     | 3/12/2019 | 18:05:15 | Clock-Out  | 44358            |
| 19     | 3/14/2019 |  5:29:32 | Clock-In   | NA               |
| 19     | 3/14/2019 | 18:02:23 | Clock-Out  | 45171            |
| 19     | 3/18/2019 | 17:44:23 | Clock-In   | NA               |
| 95     | 3/11/2019 | 10:41:51 | Clock-In   | NA               |
| 95     | 3/11/2019 | 20:15:16 | Clock-Out  | 33844            |
| 95     | 3/12/2019 | 10:51:12 | Clock-In   | NA               |
| 95     | 3/12/2019 | 20:10:07 | Clock-Out  | 33535            |
| 95     | 3/14/2019 | 10:47:12 | Clock-In   | NA               |

Upvotes: 1

Views: 208

Answers (2)

akrun
akrun

Reputation: 886938

We can use

library(data.table)
setDT(df1)[order(Emp_Id, Date, Time), Date :=
             as.numeric(Time - shift(Time)), .(Emp_Id, Date)]

Upvotes: 1

d.b
d.b

Reputation: 32548

library(dplyr)
library(tidyr)
df2 %>%
    arrange(Emp_Id, Date, Time) %>%
    group_by(Emp_Id, Date) %>%
    mutate(Diff = as.numeric(Time - lag(Time), units = "secs")) %>%
    ungroup()

Upvotes: 4

Related Questions