Find the interval in which a number falls across two data frames

Question

I have two tibbles: one has the intervals of a category and another has the occurrences of each category

int_df <- tibble(name = c(rep("John", 5), rep("Adam", 5)),
                 category = c(LETTERS[1:10]),
                 start = c(1, 14, 23, 35, 44, 52, 67, 75, 88, 91),
                 end = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
                 )

occ_df <- tibble(name = c(rep("John", 10), rep("Adam", 10)),
                 occurrence = c(1, 4, 8, 10, 12, 15, 27, 29, 34, 47,
                                52, 57, 64, 75, 78, 81, 82, 84, 86, 95)
                 )

I want to find the interval in int_df where the occurrences from occ_df fall and return the name of the corresponding category. In case the occurrence does not fall in an interval, I would like the output to be "outside".

Here is the expected outcome

# A tibble: 20 x 3
   name  occurrence category
             
 1 John           1 A       
 2 John           4 A       
 3 John           8 A       
 4 John          10 A       
 5 John          12 outside 
 6 John          15 B       
 7 John          27 C       
 8 John          29 C       
 9 John          34 outside 
10 John          47 E       
11 Adam          52 F       
12 Adam          57 F       
13 Adam          64 outside 
14 Adam          75 H       
15 Adam          78 H       
16 Adam          81 outside 
17 Adam          82 outside 
18 Adam          84 outside 
19 Adam          86 outside 
20 Adam          95 J

I would prefer a solution in tidyverse/ dplyr/ data.table

arg0naut91 · Accepted Answer

Try:

library(data.table)

setDT(int_df)[setDT(occ_df), 
              .(name, occurrence, category = replace(category, is.na(category), 'outside')), 
              on = .(name, start <= occurrence, end >= occurrence)]

Output:

    name occurrence category
 1: John          1        A
 2: John          4        A
 3: John          8        A
 4: John         10        A
 5: John         12  outside
 6: John         15        B
 7: John         27        C
 8: John         29        C
 9: John         34  outside
10: John         47        E
11: Adam         52        F
12: Adam         57        F
13: Adam         64  outside
14: Adam         75        H
15: Adam         78        H
16: Adam         81  outside
17: Adam         82  outside
18: Adam         84  outside
19: Adam         86  outside
20: Adam         95        J

Find the interval in which a number falls across two data frames

Answers (1)

Related Questions