Nivel
Nivel

Reputation: 679

Find closest non-overlapping ranges from start to end

I would like to find the closest ranges that do not overlap from the first start to the last end position. Any idea how to proceed? In the example below c(8, 33) and c(155, 161) should be filtered out because they overlap with the preceding range.

#Example data
df <- data.frame(
  start = c(7,8,14,34,67,92,125,155,170,200),
  end = c(13,33,25,66,91,124,155,161,181,214)
)

   start end
1      7  13
2      8  33
3     14  25
4     34  66
5     67  91
6     92 124
7    125 155
8    155 161
9    170 181
10   200 214

#Overlapping rows
  start end
1     8  33
2   155 161

#Desired output where overlapping rows are filtered away
  start end
1     7  13
2    14  25
3    34  66
4    67  91
5    92 124
6   125 155
7   170 181
8   200 214

Upvotes: 3

Views: 266

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 101064

Since your start column has been in the ascending order, you can check the overlap via the values of end only, e.g.,

repeat {
  ind <- with(df, head(which(!c(TRUE,end[-nrow(df)]<start[-1])),1))
  if (!length(ind)) break
  df <- df[-ind,]
}

which gives

> df
   start end
1      7  13
3     14  25
4     34  66
5     67  91
6     92 124
7    125 155
9    170 181
10   200 214

Upvotes: 0

Nivel
Nivel

Reputation: 679

I went with the following answer posted on the R community website:

find_nonover <- function(df) {
  to_drop <- logical(nrow(df))
  for (i in seq_along(df[["end"]])) {
    if (i %in% which(to_drop)) next
    to_drop <- to_drop | c(logical(i), df[i, "end"] >= df[["start"]][-seq_len(i)])
  }
  list(nonover = df[!to_drop, ],
       over    = df[to_drop, ])
}

https://community.rstudio.com/t/find-closest-non-overlapping-ranges-from-start-to-end/79642/3

Upvotes: 0

Allan Cameron
Allan Cameron

Reputation: 173793

I would do this as a simple loop, since whether a row is excluded depends on the result of the calculation for the previous row

i <- 2

while(i < nrow(df)) {
  if(df$start[i] <= df$end[i - 1]) {
    df <- df[-i,] 
  } else { 
    i <- i + 1
  }
}

df
#>    start end
#> 1      7  13
#> 3     14  25
#> 4     34  66
#> 5     67  91
#> 6     92 124
#> 7    125 155
#> 9    170 181
#> 10   200 214

Upvotes: 2

Related Questions