Reputation: 168
I have two tibbles: one has the intervals of a category and another has the occurrences of each category
int_df <- tibble(name = c(rep("John", 5), rep("Adam", 5)),
category = c(LETTERS[1:10]),
start = c(1, 14, 23, 35, 44, 52, 67, 75, 88, 91),
end = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
)
occ_df <- tibble(name = c(rep("John", 10), rep("Adam", 10)),
occurrence = c(1, 4, 8, 10, 12, 15, 27, 29, 34, 47,
52, 57, 64, 75, 78, 81, 82, 84, 86, 95)
)
I want to find the interval in int_df
where the occurrences from occ_df
fall and return the name of the corresponding category. In case the occurrence does not fall in an interval, I would like the output to be "outside".
Here is the expected outcome
# A tibble: 20 x 3
name occurrence category
<chr> <dbl> <chr>
1 John 1 A
2 John 4 A
3 John 8 A
4 John 10 A
5 John 12 outside
6 John 15 B
7 John 27 C
8 John 29 C
9 John 34 outside
10 John 47 E
11 Adam 52 F
12 Adam 57 F
13 Adam 64 outside
14 Adam 75 H
15 Adam 78 H
16 Adam 81 outside
17 Adam 82 outside
18 Adam 84 outside
19 Adam 86 outside
20 Adam 95 J
I would prefer a solution in tidyverse/ dplyr/ data.table
Upvotes: 1
Views: 50
Reputation: 14764
Try:
library(data.table)
setDT(int_df)[setDT(occ_df),
.(name, occurrence, category = replace(category, is.na(category), 'outside')),
on = .(name, start <= occurrence, end >= occurrence)]
Output:
name occurrence category
1: John 1 A
2: John 4 A
3: John 8 A
4: John 10 A
5: John 12 outside
6: John 15 B
7: John 27 C
8: John 29 C
9: John 34 outside
10: John 47 E
11: Adam 52 F
12: Adam 57 F
13: Adam 64 outside
14: Adam 75 H
15: Adam 78 H
16: Adam 81 outside
17: Adam 82 outside
18: Adam 84 outside
19: Adam 86 outside
20: Adam 95 J
Upvotes: 2