Ape
Ape

Reputation: 1169

Map values to non-overlapping intervals

I have a set of non-overlapping intervals, each with an ID. Given a number, I would like to assign it an ID depending on the interval which it belongs to (NA if no such interval exists).

intervals_id <- structure(list(ID = c(851, 852, 999), Lower = c(85101, 85201, 
    85301), Upper = c(85104, 85206, 85699)), .Names = c("ID", "Lower", 
    "Upper"), row.names = c(NA, -3L), class = "data.frame")

#    ID Lower Upper
# 1 851 85101 85104
# 2 852 85201 85206
# 3 999 85301 85699

value <- c(15555, 85102, 85201, 85206, 85207, 85600, 86999)

I put together something using cut, it seems it works, but it feels messy. Any idea of a more elegant and straightforward solution?

intervals_id <- intervals_id[order(intervals_id$Lower),]
intervals_id$UpperP <- intervals_id$Upper + 0.01
position <- as.numeric(cut(value, breaks = 
    as.numeric(t(as.matrix(intervals_id[,c("Lower", "UpperP")]))), right = FALSE))
position[position %% 2 == 0] <- NA
position <- (position + 1) %/% 2 

# desired result
data.frame(value, valueID = intervals_id$ID[position])

#   value valueID
# 1 15555      NA
# 2 85102     851
# 3 85201     852
# 4 85206     852
# 5 85207      NA
# 6 85600     999
# 7 86999      NA

Upvotes: 2

Views: 339

Answers (2)

Sotos
Sotos

Reputation: 51592

Another data.table - baseR hybrid using data.table::between can be,

sapply(value, function(i) {i1 = df$ID[data.table::between(i, df$Lower, df$Upper)]; 
                           if (length(i1) == 0){NA}else{i1}})

#[1]  NA 851 852 852  NA 999  NA

Upvotes: 2

pogibas
pogibas

Reputation: 28339

You can use foverlaps() function from a data.table package. It finds overlaps between two sets of intervals.

First we need to create data.table's and set keys for them.

library(data.table)

# Using OPs data
setDT(intervals_id)
setkey(intervals_id, Lower, Upper)

# Create dummy intervals (same coordinate) and set key
valueDT <- data.table(start = value, end = value)
setkey(valueDT, start, end)

Next, apply foverlaps() function:

foverlaps(valueDT, intervals_id)[, .(value = start, ID)]

Result:

#    value  ID
# 1: 15555  NA
# 2: 85102 851
# 3: 85201 852
# 4: 85206 852
# 5: 85207  NA
# 6: 85600 999
# 7: 86999  NA

PS. foverlaps output looks like this:

    ID Lower Upper start   end
1:  NA    NA    NA 15555 15555
2: 851 85101 85104 85102 85102
3: 852 85201 85206 85201 85201
4: 852 85201 85206 85206 85206
5:  NA    NA    NA 85207 85207
6: 999 85301 85699 85600 85600
7:  NA    NA    NA 86999 86999

If needed you can play around with foverlaps options.

  • Use nomatch to filter out intervals without overlaps
  • Use mult to report "all", "first" or "last" overlap

Upvotes: 2

Related Questions