Reputation: 267
Below is my data framedf
, which has 3 variables and approx 100k data points,
ID A B C
1 35 0 0
2 28 0 0
3 36 0 0
4 99 0 0
5 25 1 0
6 65 1 0
7 98 1 0
8 95 1 0
9 67 0 65
10 95 0 65
11 94 0 65
12 4 0 65
13 2 -1 0
14 62 -1 0
15 95 -1 0
16 25 -1 0
17 36 0 19.5
18 3 0 19.5
Now, I would like to create another variables D
which contains the difference between the Last data point of A
where B == 1 or -1
and the immediate next value in C
where B == 1 or -1
.
Expected output
ID A B C D
1 35 0 0 0
2 28 0 0 0
3 36 0 0 0
4 99 0 0 0
5 25 1 0 0
6 65 1 0 0
7 98 1 0 0
8 95 1 0 30
9 67 0 65 0
10 95 0 65 0
11 94 0 65 0
12 4 0 65 0
13 2 -1 0 0
14 62 -1 0 0
15 95 -1 0 0
16 25 -1 0 5.5
17 36 0 19.5 0
18 3 0 19.5 0
Note
Eg. D8 = A8-c9
, similarly D16 = A16 - C17
and this continues till the end of the data frame.
Update on the expected output
ID A B C D
1 35 0 0 0
2 28 0 0 0
3 36 0 0 0
4 99 0 0 0
5 25 1 0 0
6 65 1 0 0
7 98 1 0 0
8 95 1 0 30
9 67 0 65 0
10 95 0 65 0
11 94 1 65 0
12 4 0 65 0
13 2 -1 0 0
14 62 -1 0 0
15 95 -1 0 0
16 25 -1 0 5.5
17 36 0 19.5 0
18 3 0 19.5 0
19 5 0 19.5 0
20 68 1 19.5 0
21 17 0 0 0
Updated question
Another condition is that if the value in the column C
is same for the next 4 values as then the program should not consider to subtract the values A
and C
Eg, B11 ==1
it has the change of value to 1 but program should not consider that data point to find the difference between A11 and C12
because value of C
is still within the count of 4 when B
has changed. Similarly for the Data point at B20
.
Upvotes: 2
Views: 587
Reputation: 1338
Or with base R you could try:
dat$new <- 0
ind <- cumsum(rle(dat$B)$lengths)[rle(dat$B)$values%in%c(1,-1)]
dat[ind,"new"] <- dat[ind,"A"] - lead(dat[,"C"])[ind]
Upvotes: 1
Reputation: 887891
We can try with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), shift
the 'C' column to get the next value ('C1'), grouped by the run-length-id of 'B' and specifying i
(abs(B)==1
), we assign 'D' as the difference between the last values of 'A' and 'C1' (A[.N] - C1[.N]
), assign the columns that are not needed to NULL, and change the values in 'D' to 0 except for the last element determined by the grouping variable (rleid(B)
)
library(data.table)
setDT(df1)[, C1 := shift(C, type = 'lead')][abs(B)==1,
D := A[.N] - C1[.N], .(grp = rleid(B))][, C1 := NULL]
df1[df1[,.I[seq_len(.N) != .N] , rleid(B)]$V1, D := NA][is.na(D), D := 0][]
# ID A B C D
# 1: 1 35 0 0.0 0.0
# 2: 2 28 0 0.0 0.0
# 3: 3 36 0 0.0 0.0
# 4: 4 99 0 0.0 0.0
# 5: 5 25 1 0.0 0.0
# 6: 6 65 1 0.0 0.0
# 7: 7 98 1 0.0 0.0
# 8: 8 95 1 0.0 30.0
# 9: 9 67 0 65.0 0.0
#10: 10 95 0 65.0 0.0
#11: 11 94 0 65.0 0.0
#12: 12 4 0 65.0 0.0
#13: 13 2 -1 0.0 0.0
#14: 14 62 -1 0.0 0.0
#15: 15 95 -1 0.0 0.0
#16: 16 25 -1 0.0 5.5
#17: 17 36 0 19.5 0.0
#18: 18 3 0 19.5 0.0
Upvotes: 3