Maximilian
Maximilian

Reputation: 121

Match Value and Interval via overlaps

Right now I have two data frames:

amount = c(19,21,39,45,62,71,100,121,130,160,180,210,240)
id = rep(1,length(amount))
test <- data.frame(id,amount)


interval = c(1:10)
bottom = c(0,25,50,75,100,125,150,175,200,225)
top = c(24,49,74,99,124,149,174,199,224,NA)
test_2 <- data.frame(interval,bottom,top)

testcontains values of increasing amounts and test_2contains intervals. I am trying to join the two frames so that for every amount in test I get the correspondent ìnterval nr. as a new column, say test$intervall. I tried foverlapsalready like this:

setDT(test)
setDT(test_2)
foverlaps(test_2,test,by.x = amount, by.y = c(bottom,top), type = "within")

However, this returns an error message, saying

Invalid numeric value for 'by.x'; it should be a vector with values 1 <= by.x <= length(x)

Maybe one of you can help me out here, thanks!

Upvotes: 1

Views: 120

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388817

You can try fuzzyjoin :

fuzzyjoin::fuzzy_left_join(test, test_2, 
                           by = c('amount' = 'top', 'amount' = 'bottom'), 
                           match_fun = list(`<=`, `>=`))

#   id amount interval bottom top
#1   1     19        1      0  24
#2   1     21        1      0  24
#3   1     39        2     25  49
#4   1     45        2     25  49
#5   1     62        3     50  74
#6   1     71        3     50  74
#7   1    100        5    100 124
#8   1    121        5    100 124
#9   1    130        6    125 149
#10  1    160        7    150 174
#11  1    180        8    175 199
#12  1    210        9    200 224
#13  1    240       NA     NA  NA

Upvotes: 1

Karthik S
Karthik S

Reputation: 11584

Does this work:

library(dplyr)
library(tidyr)
test_2 %>% mutate(top = replace_na(top, 250)) %>% rowwise() %>% 
mutate(iv = list(seq(bottom, top, by = 1))) %>% 
unnest(iv) %>% right_join(test, by = c('iv' = 'amount'), keep = T) %>% 
select(id, interval, amount)
# A tibble: 13 x 3
      id interval amount
   <dbl>    <int>  <dbl>
 1     1        1     19
 2     1        1     21
 3     1        2     39
 4     1        2     45
 5     1        3     62
 6     1        3     71
 7     1        5    100
 8     1        5    121
 9     1        6    130
10     1        7    160
11     1        8    180
12     1        9    210
13     1       10    240

Upvotes: 1

Related Questions