Reputation: 407
I have been trying to move my calculations from excel to R and just wonder if there is a way to replicate IF (that has a step).
So my data is below and the results I get in excel with simple formula are in DIFF column (=IF(A2=A3, (C2-B3) * 24, 0)
)
NO T_DATE L_DATE DIFF
AAA 10/08/2019 17:02:00 10/08/2019 20:35:00 5.83
AAA 10/08/2019 14:45:00 10/08/2019 15:10:00 11.78
AAA 10/08/2019 03:23:00 10/08/2019 10:25:00 17.32
AAA 09/08/2019 17:06:00 10/08/2019 01:11:00 25.70
AAA 08/08/2019 23:29:00 09/08/2019 10:27:00 0
BBB 08/08/2019 09:34:00 08/08/2019 21:19:00 22.23
BBB 07/08/2019 23:05:00 08/08/2019 06:09:00 18.03
BBB 07/08/2019 12:07:00 07/08/2019 20:25:00 22.32
BBB 06/08/2019 22:06:00 07/08/2019 08:53:00 22.77
BBB 06/08/2019 10:07:00 06/08/2019 19:44:00 0
I have been trying in R with nil luck. Code to get the data frame is below:
library(data.table)
library(lubridate)
NO <- c("AAA", "AAA", "AAA", "AAA", "AAA", "BBB", "BBB", "BBB", "BBB", "BBB")
T_DATE <- c( "10/08/2019 17:02:00", "10/08/2019 14:45:00", "10/08/2019 03:23:00", "09/08/2019 17:06:00", "08/08/2019 23:29:00", "08/08/2019 09:34:00", "07/08/2019 23:05:00", "07/08/2019 12:07:00", "06/08/2019 22:06:00", "06/08/2019 10:07:00")
L_DATE <- c( "10/08/2019 20:35:00", "10/08/2019 15:10:00","10/08/2019 10:25:00", "10/08/2019 01:11:00","09/08/2019 10:27:00", "08/08/2019 21:19:00","08/08/2019 06:09:00","07/08/2019 20:25:00", "07/08/2019 08:53:00", "06/08/2019 19:44:00")
df <- data.frame(NO, T_DATE, L_DATE)
rm(DIFF,L_DATE,NO,T_DATE)
I have no clue how to add that step where L_DATE from the top row is used and the subsequent T_Date (row 2) so the first calculation is 10/08/2019 20:35:00 - 10/08/2019 14:45:00
if both NOs are the same.
Upvotes: 0
Views: 97
Reputation: 39647
In base you can convert the T_DATE and L_DATE to POSIXct
:
df$T_DATE <- as.POSIXct(df$T_DATE, format = "%d/%m/%Y %H:%M:%S")
df$L_DATE <- as.POSIXct(df$L_DATE, format = "%d/%m/%Y %H:%M:%S")
Then you can use ifelse
as you have used if in Excel. Instead of A2 you can use df[1,1]
or df$NO[1]
and instead of A3 df[2,1]
or df$NO[2]
. To make this for all rows df[1,1]
becomes to df[1:10,1]
or df[,1]
. So the following will calculate column DIFF:
df$DIFF <- ifelse(df[,1] == df[2:11,1], df[,3] - df[2:11,2], NA)
df
# NO T_DATE L_DATE DIFF
#1 AAA 2019-08-10 17:02:00 2019-08-10 20:35:00 5.833333
#2 AAA 2019-08-10 14:45:00 2019-08-10 15:10:00 11.783333
#3 AAA 2019-08-10 03:23:00 2019-08-10 10:25:00 17.316667
#4 AAA 2019-08-09 17:06:00 2019-08-10 01:11:00 25.700000
#5 AAA 2019-08-08 23:29:00 2019-08-09 10:27:00 NA
#6 BBB 2019-08-08 09:34:00 2019-08-08 21:19:00 22.233333
#7 BBB 2019-08-07 23:05:00 2019-08-08 06:09:00 18.033333
#8 BBB 2019-08-07 12:07:00 2019-08-07 20:25:00 22.316667
#9 BBB 2019-08-06 22:06:00 2019-08-07 08:53:00 22.766667
#10 BBB 2019-08-06 10:07:00 2019-08-06 19:44:00 NA
Here you get NA
for the last row of a group. In case you want a 0
you can use:
df$DIFF <- ifelse(df[,1] == df[2:11,1] & !is.na(df[2:11,1]), df[,3] - df[2:11,2], 0)
To make it more readable and generalized for different table lengths you can introduce a shifted index i
:
i <- seq_len(nrow(df)) + 1
df$DIFF <- ifelse(df$NO == df$NO[i], df$L_DATE - df$T_DATE[i], NA)
To ensure that the time difference is in hours use difftime
with units = "hours"
.
df$DIFF <- ifelse(df$NO == df$NO[i], difftime(df$L_DATE, df$T_DATE[i], units = "hours"), NA)
Upvotes: 1
Reputation: 5956
Apologies, misread your formula as having C2-C3
, not C2-B3
. Slight amendment below to correct for that.
library(dplyr)
library(lubridate)
df <- data.frame(
NO = c("AAA", "AAA", "AAA", "AAA", "AAA", "BBB", "BBB", "BBB", "BBB", "BBB"),
T_DATE = dmy_hms(c( "10/08/2019 17:02:00", "10/08/2019 14:45:00", "10/08/2019 03:23:00", "09/08/2019 17:06:00", "08/08/2019 23:29:00", "08/08/2019 09:34:00", "07/08/2019 23:05:00", "07/08/2019 12:07:00", "06/08/2019 22:06:00", "06/08/2019 10:07:00")),
L_DATE = dmy_hms(c( "10/08/2019 20:35:00", "10/08/2019 15:10:00","10/08/2019 10:25:00", "10/08/2019 01:11:00","09/08/2019 10:27:00", "08/08/2019 21:19:00","08/08/2019 06:09:00","07/08/2019 20:25:00", "07/08/2019 08:53:00", "06/08/2019 19:44:00"))
)
df %>%
group_by(NO) %>%
mutate(DIFF = difftime(L_DATE, lead(T_DATE), units = "hours"))
#> # A tibble: 10 x 4
#> # Groups: NO [2]
#> NO T_DATE L_DATE DIFF
#> <fct> <dttm> <dttm> <drtn>
#> 1 AAA 2019-08-10 17:02:00 2019-08-10 20:35:00 5.833333 hours
#> 2 AAA 2019-08-10 14:45:00 2019-08-10 15:10:00 11.783333 hours
#> 3 AAA 2019-08-10 03:23:00 2019-08-10 10:25:00 17.316667 hours
#> 4 AAA 2019-08-09 17:06:00 2019-08-10 01:11:00 25.700000 hours
#> 5 AAA 2019-08-08 23:29:00 2019-08-09 10:27:00 NA hours
#> 6 BBB 2019-08-08 09:34:00 2019-08-08 21:19:00 22.233333 hours
#> 7 BBB 2019-08-07 23:05:00 2019-08-08 06:09:00 18.033333 hours
#> 8 BBB 2019-08-07 12:07:00 2019-08-07 20:25:00 22.316667 hours
#> 9 BBB 2019-08-06 22:06:00 2019-08-07 08:53:00 22.766667 hours
#> 10 BBB 2019-08-06 10:07:00 2019-08-06 19:44:00 NA hours
Upvotes: 6
Reputation: 3994
Using tidyverse
:
df %>%
mutate(T_DATE = as.POSIXct(T_DATE, format = "%m/%d/%Y %H:%M:%S"),
L_DATE = as.POSIXct(L_DATE, format = "%m/%d/%Y %H:%M:%S"),
diff_r = ifelse(NO == lead(NO), difftime(L_DATE, lead(T_DATE), units = "hours"), 0))
I convert your data to date-time format using as.POSIXct
then I take your equation and apply it resulting:
NO T_DATE L_DATE DIFF diff_r
AAA 2019-10-08 17:02:00 2019-10-08 20:35:00 5.83 5.833333
AAA 2019-10-08 14:45:00 2019-10-08 15:10:00 11.78 11.783333
AAA 2019-10-08 03:23:00 2019-10-08 10:25:00 17.32 713.316667
AAA 2019-09-08 17:06:00 2019-10-08 01:11:00 25.70 1441.700000
AAA 2019-08-08 23:29:00 2019-09-08 10:27:00 0.00 0.000000
BBB 2019-08-08 09:34:00 2019-08-08 21:19:00 22.23 742.233333
BBB 2019-07-08 23:05:00 2019-08-08 06:09:00 18.03 738.033333
BBB 2019-07-08 12:07:00 2019-07-08 20:25:00 22.32 718.316667
BBB 2019-06-08 22:06:00 2019-07-08 08:53:00 22.77 718.766667
BBB 2019-06-08 10:07:00 2019-06-08 19:44:00 0.00 NA
I'm American, and assumed your data is M/D/Y, but if its not, then you can convert it to D/M/Y.
Upvotes: 0
Reputation: 16178
Alternatively to difftime
, you can use ifelse
:
library(lubridate)
df$T_DATE = mdy_hms(T_DATE)
df$L_DATE = mdy_hms(L_DATE)
library(tidyverse)
df %>% mutate(Diff = ifelse(lead(NO) == NO,L_DATE-lead(T_DATE),0))
NO T_DATE L_DATE Diff
1 AAA 2019-10-08 17:02:00 2019-10-08 20:35:00 5.833333
2 AAA 2019-10-08 14:45:00 2019-10-08 15:10:00 11.783333
3 AAA 2019-10-08 03:23:00 2019-10-08 10:25:00 713.316667
4 AAA 2019-09-08 17:06:00 2019-10-08 01:11:00 1441.700000
5 AAA 2019-08-08 23:29:00 2019-09-08 10:27:00 0.000000
6 BBB 2019-08-08 09:34:00 2019-08-08 21:19:00 742.233333
7 BBB 2019-07-08 23:05:00 2019-08-08 06:09:00 738.033333
8 BBB 2019-07-08 12:07:00 2019-07-08 20:25:00 718.316667
9 BBB 2019-06-08 22:06:00 2019-07-08 08:53:00 718.766667
10 BBB 2019-06-08 10:07:00 2019-06-08 19:44:00 NA
Upvotes: 0