Reputation: 717
Givena matrix with 3 variables, an identifier and 2 numeric variables "A" and "B". I have defined certain ranges for the values of variable "A" and other ranges for "B". I want to extract and store the identifiers that verify that "A" and "B" variables are within the specified ranges.
Consider for instance the following example:
DF = data.frame(identifier = 1:18, A=rep(c(1,2,3,4,5,6), each=3), B=rep(c(11,12,13,14, 15, 16), 3))
interval_a = c(1, 3, 6)
interval_b = c(11, 13, 16)
I have a dataframe and I want to extract the identifiers that verify that A is between 1 and 3, and B is between 11 and 13. Then, I want the identifiers that verify that A is between 1 and 3, and B is between 13 and 16, and so on.
I know this can be done with a nested for loop like this:
identifier_list = list()
for(i in 1:(length(interval_a)-1))
{
df_tmp = DF[which(DF$A<interval_a[i+1] & DF$A>=interval_a[i]),]
for(j in 1:(length(interval_b)-1))
{
identifier_list[[(length(identifier_list) +1)]] = df_tmp[which(df_tmp$B<interval_b[j+1] & df_tmp$B>=interval_b[j]),'identifier']
}
}
But I find this not practical if the number of intervals to be considered in A and B is large. Is there any better way?
Upvotes: 2
Views: 66
Reputation: 8110
Here is one way to do this. we first expand a grid to get every combination of A and B ranges, then we nest the data in the conditions dataframe, then we map out which rows fit the criteria, and lastly we pull out the identifiers and collapse them into one variable.
library(tidyverse)
list(
expand_grid(a1 = interval_a, a2 = interval_a) %>%
filter(a1 != a2 & a1 < a2) %>%
mutate(temp = "t"),
expand_grid(b1 = interval_b, b2 = interval_b) %>%
filter(b1 != b2 & b1 < b2)%>%
mutate(temp = "t")
) %>%
reduce(full_join, by = "temp") %>%
select(-temp) %>%
mutate(data = list(DF),
identifiers = pmap_chr(list(data, a1,a2,b1,b2),
~filter(..1, A > ..2 & A < ..3 & B > ..4 & B < ..5) %>%
pull(identifier) %>% paste(., collapse = ","))) %>%
select(-data)
#> # A tibble: 9 x 5
#> a1 a2 b1 b2 identifiers
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 1 3 11 13 ""
#> 2 1 3 11 16 "4,5"
#> 3 1 3 13 16 "4,5"
#> 4 1 6 11 13 "8,14"
#> 5 1 6 11 16 "4,5,8,9,10,11,14,15"
#> 6 1 6 13 16 "4,5,10,11"
#> 7 3 6 11 13 "14"
#> 8 3 6 11 16 "10,11,14,15"
#> 9 3 6 13 16 "10,11"
Upvotes: 0
Reputation: 11255
Here's a data.table approach which relies on 1) creating a lookup table and 2) doing a non-equi join.
library(data.table)
DF = data.frame(identifier = 1:18, A=rep(c(1,2,3,4,5,6), each=3), B=rep(c(11,12,13,14, 15, 16), 3))
interval_a = c(1, 3, 6)
interval_b = c(11, 13, 16)
# make lookup tables based on the a and b intervals
int_a = data.frame(A_start = interval_a[-length(interval_a)],
A_end = interval_a[-1L])
int_b = data.frame(B_start = interval_b[-length(interval_b)],
B_end = interval_b[-1L])
#all combinations of the two intervals and adding an ID
int_lookup = merge(int_a, int_b, by = NULL)
setDT(int_lookup)
int_lookup[, ID := .I]
int_lookup
#> A_start A_end B_start B_end ID
#> <num> <num> <num> <num> <int>
#> 1: 1 3 11 13 1
#> 2: 3 6 11 13 2
#> 3: 1 3 13 16 3
#> 4: 3 6 13 16 4
# make DF a data.table and do a non-equi join
setDT(DF)
DF[int_lookup,
on = .(A >= A_start,
A < A_end,
B >= B_start,
B < B_end),
comb_ID := ID]
DF
#> identifier A B comb_ID
#> <int> <num> <num> <int>
#> 1: 1 1 11 1
#> 2: 2 1 12 1
#> 3: 3 1 13 3
#> 4: 4 2 14 3
#> 5: 5 2 15 3
#> 6: 6 2 16 NA
#> 7: 7 3 11 2
#> 8: 8 3 12 2
#> 9: 9 3 13 4
#> 10: 10 4 14 4
#> 11: 11 4 15 4
#> 12: 12 4 16 NA
#> 13: 13 5 11 2
#> 14: 14 5 12 2
#> 15: 15 5 13 4
#> 16: 16 6 14 NA
#> 17: 17 6 15 NA
#> 18: 18 6 16 NA
We can get a similar list to your output if we split by comb_ID
at the end:
split(DF, by = 'comb_ID')
> split(DF, by = 'comb_ID')
$`1`
identifier A B comb_ID
<int> <num> <num> <int>
1: 1 1 11 1
2: 2 1 12 1
$`3`
identifier A B comb_ID
<int> <num> <num> <int>
1: 3 1 13 3
2: 4 2 14 3
3: 5 2 15 3
$`NA`
identifier A B comb_ID
<int> <num> <num> <int>
1: 6 2 16 NA
2: 12 4 16 NA
3: 16 6 14 NA
4: 17 6 15 NA
5: 18 6 16 NA
$`2`
identifier A B comb_ID
<int> <num> <num> <int>
1: 7 3 11 2
2: 8 3 12 2
3: 13 5 11 2
4: 14 5 12 2
$`4`
identifier A B comb_ID
<int> <num> <num> <int>
1: 9 3 13 4
2: 10 4 14 4
3: 11 4 15 4
4: 15 5 13 4
Upvotes: 2