Reputation: 3
I have some data with a similar structure.
test_data <- data.frame(start = c(1,15,17,35),
time = c(87, 1, 35, 3),
end = c(88,16,52,38))
test_data
start time end
1 1 88 87
2 15 1 16
3 17 35 52
4 35 3 38
I want to compare the "start" variable with the "end" variable of the previous row. If the difference is less then 2 I want to sum them by time, leaving the start from the first row and end of the second row.
So in this test data start in the third observation is 17 and end in the second is 16. The difference is 1 and thus I want to sum them.
I expect such an output
start time end
1 1 88 87
2 15 36 52
3 35 3 38
Is there a neat way to do this in R? I tried writing a for loop, but it seems that I am overcomplicating it a lot.
Upvotes: 0
Views: 170
Reputation: 8506
You could use data.table
with a combination of shift
and cumsum
:
library(data.table)
test_data <- data.frame(start = c(1,15,17,35),
time = c(87, 1, 35, 3),
end = c(88,16,52,38))
setDT(test_data)
test_data[, gid:=ifelse(.I>1 & shift(end) - start < 2, FALSE, TRUE)]
test_data[,.(start=head(start,1), time=sum(time), end=tail(end,1)), by=cumsum(gid)][,-"cumsum"]
#> start time end
#> 1: 1 87 88
#> 2: 15 36 52
#> 3: 35 3 38
Created on 2021-01-31 by the reprex package (v1.0.0)
Note that this would also merge multiple consecutive rows (not just 2) with <2 distance between previous end and current start values.
Upvotes: 1