Reputation: 487
I have an excel file which has a column 'Time' which represents the time. The data type of this column is POSIXct. When I load the excel file in R, some random date part gets appended to the time and so I want to delete this random dates appended, retain only the time part and then calculate the difference between consecutive rows based on grouping Emp_Id and Date columns where I need to see how much was the difference in the clock in and clock out time each day for each employee.
This is how my data looks like when loaded in R with random date getting added.
| Emp_Id | Date | Time | Time_Event |
|--------|:---------:|---------------------:|------------|
| 95 | 3/14/2019 | 1899-12-31 10:47:12 | Clock-In |
| 95 | 3/12/2019 | 1899-12-31 10:51:12 | Clock-In |
| 95 | 3/11/2019 | 1899-12-31 8:15:16 | Clock-Out |
| 95 | 3/12/2019 | 1899-12-31 8:10:07 | Clock-Out |
| 95 | 3/11/2019 | 1899-12-31 10:41:51 | Clock-In |
| 19 | 3/14/2019 | 1899-12-31 6:02:23 | Clock-Out |
| 19 | 3/18/2019 | 1899-12-31 5:44:23 | Clock-In |
| 19 | 3/12/2019 | 1899-12-31 6:05:15 | Clock-Out |
| 19 | 3/12/2019 | 1899-12-31 5:45:57 | Clock-In |
| 19 | 3/14/2019 | 1899-12-31 5:29:32 | Clock-In |
To make it easy, the data will be:
Emp_Id <- as.numeric(c("95", "95", "95", "95", "95", "19", "19", "19", "19", "19"))
Date <- c("3/14/2019","3/12/2019","3/11/2019", "3/12/2019","3/11/2019","3/14/2019","3/18/2019","3/12/2019","3/12/2019","3/14/2019")
Time <- as.POSIXct(c("1899-12-31 10:47:12", "1899-12-31 10:51:12", "1899-12-31 8:15:16","1899-12-31 8:10:07", "1899-12-31 10:41:51",
"1899-12-31 6:02:23", "1899-12-31 5:44:23", "1899-12-31 6:05:15", "1899-12-31 5:45:57","1899-12-31 5:29:32"))
Time_Event <- c("Clock-In","Clock-In","Clock-Out","Clock-Out","Clock-In","Clock-Out","Clock-In","Clock-Out","Clock-In","Clock-In")
df2 <- data.frame(Emp_Id,Date,Time,Time_Event, stringsAsFactors = F)
df2$Date= as.Date(df2$Date, format = "%m/%d/%Y")
Using df$Time <- format(strptime(df$Time, "%Y-%m-%d %H:%M:%S"), "%H:%M:%S")
strips the time part but converts the data type to character. Since I need to calculate the difference, I cannot do this on character data type. I have been through this link How to calculate time difference in consecutive rows, but this doesn't help.
I have tried the below code but I get error because of the character data type
df2 <- df2 %>%
arrange(df2$Emp_Id, df2$Date, df2$Time) %>%
group_by(df2$Emp_Id,df2$Date) %>%
mutate(diff = format(strptime(df2$Time, "%Y-%m-%d %H:%M:%S"),"%H:%M:%S")-
lag(format(strptime(df2$Time, "%Y-%m-%d %H:%M:%S"),"%H:%M:%S"),
default = format(strptime(df2$Time, "%Y-%m-%d %H:%M:%S"),"%H:%M:%S")[1]),
diff_secs = as.numeric(diff, units = 'secs'))
How can I achieve the final output to look like:
| Emp_Id | Date | Time | Time_Event | Diff(In seconds) |
|--------|:---------:|---------:|------------|------------------|
| 19 | 3/12/2019 | 5:45:57 | Clock-In | NA |
| 19 | 3/12/2019 | 18:05:15 | Clock-Out | 44358 |
| 19 | 3/14/2019 | 5:29:32 | Clock-In | NA |
| 19 | 3/14/2019 | 18:02:23 | Clock-Out | 45171 |
| 19 | 3/18/2019 | 17:44:23 | Clock-In | NA |
| 95 | 3/11/2019 | 10:41:51 | Clock-In | NA |
| 95 | 3/11/2019 | 20:15:16 | Clock-Out | 33844 |
| 95 | 3/12/2019 | 10:51:12 | Clock-In | NA |
| 95 | 3/12/2019 | 20:10:07 | Clock-Out | 33535 |
| 95 | 3/14/2019 | 10:47:12 | Clock-In | NA |
Upvotes: 1
Views: 208
Reputation: 886938
We can use
library(data.table)
setDT(df1)[order(Emp_Id, Date, Time), Date :=
as.numeric(Time - shift(Time)), .(Emp_Id, Date)]
Upvotes: 1
Reputation: 32548
library(dplyr)
library(tidyr)
df2 %>%
arrange(Emp_Id, Date, Time) %>%
group_by(Emp_Id, Date) %>%
mutate(Diff = as.numeric(Time - lag(Time), units = "secs")) %>%
ungroup()
Upvotes: 4