Kalenji
Kalenji

Reputation: 407

Subtract values from different row

I have been trying to move my calculations from excel to R and just wonder if there is a way to replicate IF (that has a step).

So my data is below and the results I get in excel with simple formula are in DIFF column (=IF(A2=A3, (C2-B3) * 24, 0))

NO  T_DATE              L_DATE              DIFF
AAA 10/08/2019 17:02:00 10/08/2019 20:35:00 5.83
AAA 10/08/2019 14:45:00 10/08/2019 15:10:00 11.78
AAA 10/08/2019 03:23:00 10/08/2019 10:25:00 17.32
AAA 09/08/2019 17:06:00 10/08/2019 01:11:00 25.70
AAA 08/08/2019 23:29:00 09/08/2019 10:27:00 0
BBB 08/08/2019 09:34:00 08/08/2019 21:19:00 22.23
BBB 07/08/2019 23:05:00 08/08/2019 06:09:00 18.03
BBB 07/08/2019 12:07:00 07/08/2019 20:25:00 22.32
BBB 06/08/2019 22:06:00 07/08/2019 08:53:00 22.77
BBB 06/08/2019 10:07:00 06/08/2019 19:44:00 0

I have been trying in R with nil luck. Code to get the data frame is below:

library(data.table)
library(lubridate)

NO <- c("AAA", "AAA", "AAA", "AAA", "AAA", "BBB", "BBB", "BBB", "BBB", "BBB")
T_DATE <- c( "10/08/2019 17:02:00",  "10/08/2019 14:45:00", "10/08/2019 03:23:00",  "09/08/2019 17:06:00", "08/08/2019 23:29:00",  "08/08/2019 09:34:00", "07/08/2019 23:05:00", "07/08/2019 12:07:00", "06/08/2019 22:06:00", "06/08/2019 10:07:00")

L_DATE <- c( "10/08/2019 20:35:00", "10/08/2019 15:10:00","10/08/2019 10:25:00", "10/08/2019 01:11:00","09/08/2019 10:27:00", "08/08/2019 21:19:00","08/08/2019 06:09:00","07/08/2019 20:25:00", "07/08/2019 08:53:00", "06/08/2019 19:44:00")

df <- data.frame(NO, T_DATE, L_DATE)

rm(DIFF,L_DATE,NO,T_DATE)

I have no clue how to add that step where L_DATE from the top row is used and the subsequent T_Date (row 2) so the first calculation is 10/08/2019 20:35:00 - 10/08/2019 14:45:00 if both NOs are the same.

Upvotes: 0

Views: 97

Answers (4)

GKi
GKi

Reputation: 39647

In base you can convert the T_DATE and L_DATE to POSIXct:

df$T_DATE <- as.POSIXct(df$T_DATE, format = "%d/%m/%Y  %H:%M:%S")
df$L_DATE <- as.POSIXct(df$L_DATE, format = "%d/%m/%Y  %H:%M:%S")

Then you can use ifelse as you have used if in Excel. Instead of A2 you can use df[1,1] or df$NO[1] and instead of A3 df[2,1] or df$NO[2]. To make this for all rows df[1,1] becomes to df[1:10,1] or df[,1]. So the following will calculate column DIFF:

df$DIFF  <- ifelse(df[,1] == df[2:11,1], df[,3] - df[2:11,2], NA)
df
#    NO              T_DATE              L_DATE      DIFF
#1  AAA 2019-08-10 17:02:00 2019-08-10 20:35:00  5.833333
#2  AAA 2019-08-10 14:45:00 2019-08-10 15:10:00 11.783333
#3  AAA 2019-08-10 03:23:00 2019-08-10 10:25:00 17.316667
#4  AAA 2019-08-09 17:06:00 2019-08-10 01:11:00 25.700000
#5  AAA 2019-08-08 23:29:00 2019-08-09 10:27:00        NA
#6  BBB 2019-08-08 09:34:00 2019-08-08 21:19:00 22.233333
#7  BBB 2019-08-07 23:05:00 2019-08-08 06:09:00 18.033333
#8  BBB 2019-08-07 12:07:00 2019-08-07 20:25:00 22.316667
#9  BBB 2019-08-06 22:06:00 2019-08-07 08:53:00 22.766667
#10 BBB 2019-08-06 10:07:00 2019-08-06 19:44:00        NA

Here you get NA for the last row of a group. In case you want a 0 you can use:

df$DIFF  <- ifelse(df[,1] == df[2:11,1] & !is.na(df[2:11,1]), df[,3] - df[2:11,2], 0)

To make it more readable and generalized for different table lengths you can introduce a shifted index i:

i  <- seq_len(nrow(df)) + 1
df$DIFF  <- ifelse(df$NO == df$NO[i], df$L_DATE - df$T_DATE[i], NA)

To ensure that the time difference is in hours use difftime with units = "hours".

df$DIFF  <- ifelse(df$NO == df$NO[i], difftime(df$L_DATE, df$T_DATE[i], units = "hours"), NA)

Upvotes: 1

caldwellst
caldwellst

Reputation: 5956

Apologies, misread your formula as having C2-C3, not C2-B3. Slight amendment below to correct for that.

library(dplyr)
library(lubridate)

df <- data.frame(
  NO = c("AAA", "AAA", "AAA", "AAA", "AAA", "BBB", "BBB", "BBB", "BBB", "BBB"),
  T_DATE = dmy_hms(c( "10/08/2019 17:02:00",  "10/08/2019 14:45:00", "10/08/2019 03:23:00",  "09/08/2019 17:06:00", "08/08/2019 23:29:00",  "08/08/2019 09:34:00", "07/08/2019 23:05:00", "07/08/2019 12:07:00", "06/08/2019 22:06:00", "06/08/2019 10:07:00")),
  L_DATE = dmy_hms(c( "10/08/2019 20:35:00", "10/08/2019 15:10:00","10/08/2019 10:25:00", "10/08/2019 01:11:00","09/08/2019 10:27:00", "08/08/2019 21:19:00","08/08/2019 06:09:00","07/08/2019 20:25:00", "07/08/2019 08:53:00", "06/08/2019 19:44:00"))
)

df %>% 
  group_by(NO) %>% 
  mutate(DIFF = difftime(L_DATE, lead(T_DATE), units = "hours"))
#> # A tibble: 10 x 4
#> # Groups:   NO [2]
#>    NO    T_DATE              L_DATE              DIFF           
#>    <fct> <dttm>              <dttm>              <drtn>         
#>  1 AAA   2019-08-10 17:02:00 2019-08-10 20:35:00  5.833333 hours
#>  2 AAA   2019-08-10 14:45:00 2019-08-10 15:10:00 11.783333 hours
#>  3 AAA   2019-08-10 03:23:00 2019-08-10 10:25:00 17.316667 hours
#>  4 AAA   2019-08-09 17:06:00 2019-08-10 01:11:00 25.700000 hours
#>  5 AAA   2019-08-08 23:29:00 2019-08-09 10:27:00        NA hours
#>  6 BBB   2019-08-08 09:34:00 2019-08-08 21:19:00 22.233333 hours
#>  7 BBB   2019-08-07 23:05:00 2019-08-08 06:09:00 18.033333 hours
#>  8 BBB   2019-08-07 12:07:00 2019-08-07 20:25:00 22.316667 hours
#>  9 BBB   2019-08-06 22:06:00 2019-08-07 08:53:00 22.766667 hours
#> 10 BBB   2019-08-06 10:07:00 2019-08-06 19:44:00        NA hours

Upvotes: 6

akash87
akash87

Reputation: 3994

Using tidyverse:

df %>% 
mutate(T_DATE = as.POSIXct(T_DATE, format = "%m/%d/%Y  %H:%M:%S"), 
       L_DATE = as.POSIXct(L_DATE, format = "%m/%d/%Y  %H:%M:%S"), 
       diff_r = ifelse(NO == lead(NO), difftime(L_DATE, lead(T_DATE), units = "hours"), 0))

I convert your data to date-time format using as.POSIXct then I take your equation and apply it resulting:

 NO              T_DATE              L_DATE  DIFF      diff_r
AAA 2019-10-08 17:02:00 2019-10-08 20:35:00  5.83    5.833333
AAA 2019-10-08 14:45:00 2019-10-08 15:10:00 11.78   11.783333
AAA 2019-10-08 03:23:00 2019-10-08 10:25:00 17.32  713.316667
AAA 2019-09-08 17:06:00 2019-10-08 01:11:00 25.70 1441.700000
AAA 2019-08-08 23:29:00 2019-09-08 10:27:00  0.00    0.000000
BBB 2019-08-08 09:34:00 2019-08-08 21:19:00 22.23  742.233333
BBB 2019-07-08 23:05:00 2019-08-08 06:09:00 18.03  738.033333
BBB 2019-07-08 12:07:00 2019-07-08 20:25:00 22.32  718.316667
BBB 2019-06-08 22:06:00 2019-07-08 08:53:00 22.77  718.766667
BBB 2019-06-08 10:07:00 2019-06-08 19:44:00  0.00          NA

I'm American, and assumed your data is M/D/Y, but if its not, then you can convert it to D/M/Y.

Upvotes: 0

dc37
dc37

Reputation: 16178

Alternatively to difftime, you can use ifelse :

library(lubridate)
df$T_DATE = mdy_hms(T_DATE)
df$L_DATE = mdy_hms(L_DATE)  

library(tidyverse)
df %>% mutate(Diff = ifelse(lead(NO) == NO,L_DATE-lead(T_DATE),0))

    NO              T_DATE              L_DATE        Diff
1  AAA 2019-10-08 17:02:00 2019-10-08 20:35:00    5.833333
2  AAA 2019-10-08 14:45:00 2019-10-08 15:10:00   11.783333
3  AAA 2019-10-08 03:23:00 2019-10-08 10:25:00  713.316667
4  AAA 2019-09-08 17:06:00 2019-10-08 01:11:00 1441.700000
5  AAA 2019-08-08 23:29:00 2019-09-08 10:27:00    0.000000
6  BBB 2019-08-08 09:34:00 2019-08-08 21:19:00  742.233333
7  BBB 2019-07-08 23:05:00 2019-08-08 06:09:00  738.033333
8  BBB 2019-07-08 12:07:00 2019-07-08 20:25:00  718.316667
9  BBB 2019-06-08 22:06:00 2019-07-08 08:53:00  718.766667
10 BBB 2019-06-08 10:07:00 2019-06-08 19:44:00          NA

Upvotes: 0

Related Questions