Reputation: 781
I want to check overlap of data, here is data
ID <- c(rep(1,3), rep(3, 5), rep(4,4),rep(5,5))
Begin <- c(0,2.5,3,7,8,7,25,25,10,15,17,20,1,NA,10,11,13)
End <- c(1.5,3.5,6,12,8,11,29,35, 12,19,NA,28,5,20,30,20,25)
df <- data.frame(ID, Begin, End)
df
ID Begin End
1 1 0.0 1.5
2 1 2.5 3.5
3 1 3.0 6.0*
4 3 7.0 12.0
5 3 8.0 8.0*
6 3 7.0 11.0*
7 3 25.0 29.0
8 3 25.0 35.0*
9 4 10.0 12.0
10 4 15.0 19.0
11 4 17.0 NA*
12 4 20.0 28.0
13 5 1.0 5.0
14 5 NA 20.0
15 5 10.0 30.0
16 5 11.0 20.0*
17 5 13.0 25.0*
*
means it's overlap:
So here is my output design
ID Begin End Begin_New1
1 1 0.0 1.5 0.0
2 1 2.5 3.5 2.5
3 1 3.0 6.0 3.5*
4 3 7.0 12.0 7.0
5 3 8.0 8.0 12.0*
6 3 7.0 11.0 12.0*
7 3 25.0 29.0 25.0
8 3 25.0 35.0 29.0*
9 4 10.0 12.0 10.0
10 4 15.0 19.0 15.0
11 4 17.0 NA 19.0*
12 4 20.0 28.0 20.0
13 5 1.0 5.0 1.0
14 5 NA 20.0 NA
15 5 10.0 30.0 20.0*
16 5 11.0 20.0 30.0*
17 5 13.0 25.0 30.0*
When I use this code, I don't get the output I want, it shift only 1 row and compare each row
setDT(df)[, Begin_New := shift(End), by = ID][!which(Begin < Begin_New), Begin_New:= Begin]
ID Begin End Begin_New
1: 1 0.0 1.5 0.0
2: 1 2.5 3.5 2.5
3: 1 3.0 6.0 3.5
4: 3 7.0 12.0 7.0
5: 3 8.0 8.0 12.0
6: 3 7.0 11.0 8.0
7: 3 25.0 29.0 25.0
8: 3 25.0 35.0 29.0
9: 4 10.0 12.0 10.0
10: 4 15.0 19.0 15.0
11: 4 17.0 NA 19.0
12: 4 20.0 28.0 20.0
13: 5 1.0 5.0 1.0
14: 5 NA 20.0 NA
15: 5 10.0 30.0 20.0
16: 5 11.0 20.0 30.0
17: 5 13.0 25.0 20.0
This is the output I don't want it
Upvotes: 3
Views: 137
Reputation: 66819
I think your code is pretty much right, you just need to use cummax
:
df[, Begin_New := {
high_so_far = shift(cummax(End), fill=Begin[1L])
w = which(Begin < high_so_far)
Begin[w] = high_so_far[w]
Begin
}, by=ID]
Upvotes: 6