How can I compare two rows in R data frame by different columns and perform an operation on them?

Question

I have some data with a similar structure.

test_data <- data.frame(start = c(1,15,17,35),
                         time = c(87, 1, 35, 3),
                         end = c(88,16,52,38))
test_data
  start   time end
1     1     88  87
2    15      1  16
3    17     35  52
4    35      3  38

I want to compare the "start" variable with the "end" variable of the previous row. If the difference is less then 2 I want to sum them by time, leaving the start from the first row and end of the second row.

So in this test data start in the third observation is 17 and end in the second is 16. The difference is 1 and thus I want to sum them.

I expect such an output

    start time end
1     1     88  87
2    15     36  52
3    35      3  38

Is there a neat way to do this in R? I tried writing a for loop, but it seems that I am overcomplicating it a lot.

user12728748 · Accepted Answer

You could use data.table with a combination of shift and cumsum:

library(data.table)
test_data <- data.frame(start = c(1,15,17,35),
                        time = c(87, 1, 35, 3),
                        end = c(88,16,52,38))

setDT(test_data)
test_data[, gid:=ifelse(.I>1 & shift(end) - start < 2, FALSE, TRUE)]
test_data[,.(start=head(start,1), time=sum(time), end=tail(end,1)), by=cumsum(gid)][,-"cumsum"]
#>    start time end
#> 1:     1   87  88
#> 2:    15   36  52
#> 3:    35    3  38

^{Created on 2021-01-31 by the reprex package (v1.0.0)}

Note that this would also merge multiple consecutive rows (not just 2) with <2 distance between previous end and current start values.

How can I compare two rows in R data frame by different columns and perform an operation on them?

Answers (1)

Related Questions