Find interval over similar column names in R

Question

I'd like a simpler way to find whether a value is within any of the given intervals. Given the dataframe:

Value  start1  start2  start3  end1  end2  end3
212    82      195     409     97    220   411
80     57      95      111     69    100   130

I'd like to create a new column with a 1 if the 'Value' is in any of the intervals ([start1-end1], [start2-end2], etc) and a 0 if it is not; so in the case above, the first row would have a value of 1 because 212 falls in the second interval and the second row would be a 0. Note that these are the orders of the columns currently and edge cases (value matching the start or end of the interval) should be encoded as a 1.

I could do this with ifelse statements but there are 260 columns and I feel like the solution could be useful in other ways in the future.

sirallen · Accepted Answer

With the data.table package:

library(data.table)

dt = data.table(Value=c(212,80), start1=c(82,57), start2=c(195,95), start3=c(409,111),
                end1=c(97,69), end2=c(220,100), end3=c(411,130))

dt[, rowid:= .I]

It's more natural to do this with a 'melted' table:

dt_melt = melt(dt, id=c('rowid','Value'), measure=patterns('start','end'),
               variable.name='interval', value.name=c('start','end'))

#    rowid Value interval start end
# 1:     1   212        1    82  97
# 2:     2    80        1    57  69
# 3:     1   212        2   195 220
# 4:     2    80        2    95 100
# 5:     1   212        3   409 411
# 6:     2    80        3   111 130

Now we can do the computation and merge with dt on the rowid:

dt[dt_melt[, as.integer(any(between(Value, start, end))), by='rowid'], on='rowid']

#    Value start1 start2 start3 end1 end2 end3 rowid V1
# 1:   212     82    195    409   97  220  411     1  1
# 2:    80     57     95    111   69  100  130     2  0

Find interval over similar column names in R

Answers (2)

Related Questions