Reputation: 679
I would like to find the closest ranges that do not overlap from the first start to the last end position. Any idea how to proceed? In the example below c(8, 33) and c(155, 161) should be filtered out because they overlap with the preceding range.
#Example data
df <- data.frame(
start = c(7,8,14,34,67,92,125,155,170,200),
end = c(13,33,25,66,91,124,155,161,181,214)
)
start end
1 7 13
2 8 33
3 14 25
4 34 66
5 67 91
6 92 124
7 125 155
8 155 161
9 170 181
10 200 214
#Overlapping rows
start end
1 8 33
2 155 161
#Desired output where overlapping rows are filtered away
start end
1 7 13
2 14 25
3 34 66
4 67 91
5 92 124
6 125 155
7 170 181
8 200 214
Upvotes: 3
Views: 266
Reputation: 101064
Since your start
column has been in the ascending order, you can check the overlap via the values of end
only, e.g.,
repeat {
ind <- with(df, head(which(!c(TRUE,end[-nrow(df)]<start[-1])),1))
if (!length(ind)) break
df <- df[-ind,]
}
which gives
> df
start end
1 7 13
3 14 25
4 34 66
5 67 91
6 92 124
7 125 155
9 170 181
10 200 214
Upvotes: 0
Reputation: 679
I went with the following answer posted on the R community website:
find_nonover <- function(df) {
to_drop <- logical(nrow(df))
for (i in seq_along(df[["end"]])) {
if (i %in% which(to_drop)) next
to_drop <- to_drop | c(logical(i), df[i, "end"] >= df[["start"]][-seq_len(i)])
}
list(nonover = df[!to_drop, ],
over = df[to_drop, ])
}
https://community.rstudio.com/t/find-closest-non-overlapping-ranges-from-start-to-end/79642/3
Upvotes: 0
Reputation: 173793
I would do this as a simple loop, since whether a row is excluded depends on the result of the calculation for the previous row
i <- 2
while(i < nrow(df)) {
if(df$start[i] <= df$end[i - 1]) {
df <- df[-i,]
} else {
i <- i + 1
}
}
df
#> start end
#> 1 7 13
#> 3 14 25
#> 4 34 66
#> 5 67 91
#> 6 92 124
#> 7 125 155
#> 9 170 181
#> 10 200 214
Upvotes: 2