Reputation: 331
I want each groups' numbers to be above the specified threshold. For example, I want group 1 to have value above .25, group 2 above .5, etc. REPREX below.
set.seed(1234)
group <- c(rep("group 1", 30),
rep("group 2", 30),
rep("group 3", 30),
rep("group 4", 30))
number <- c(runif(30, 0, .5), #group 1 data
runif(30, .25, .75), #group 2 data, etc.
runif(30, .5, 1),
runif(30, .75, 1.25))
d <- data.frame(group = group,
number = number)
threshold <- c(.25, .5, .75, 1)
library(dplyr)
d %>% group_by(group) %>% filter(number >= threshold)
The final line returns the warning:
Warning messages:
1: In number >= threshold :
longer object length is not a multiple of shorter object length
2: In number >= threshold :
longer object length is not a multiple of shorter object length
3: In number >= threshold :
longer object length is not a multiple of shorter object length
4: In number >= threshold :
longer object length is not a multiple of shorter object length
Please advise. Thanks!
Upvotes: 2
Views: 662
Reputation: 33802
One way to do this using groups: add a column where the threshold is defined using the group index. The approach works for your example data, but may not be a general solution.
d %>%
group_by(group) %>%
mutate(threshold = cur_group_id() / 4) %>%
filter(number >= threshold)
group number threshold
1 group 1 0.3111497 0.25
2 group 1 0.3046374 0.25
3 group 1 0.3116897 0.25
4 group 1 0.4304577 0.25
5 group 1 0.3201553 0.25
6 group 1 0.3330419 0.25
7 group 1 0.2571256 0.25
8 group 1 0.3467956 0.25
9 group 1 0.2724874 0.25
10 group 1 0.4617167 0.25
11 group 1 0.4186478 0.25
12 group 1 0.4052993 0.25
13 group 1 0.2628488 0.25
14 group 1 0.4573291 0.25
15 group 1 0.4156725 0.25
16 group 2 0.5036534 0.50
17 group 2 0.6298353 0.50
18 group 2 0.7460752 0.50
19 group 2 0.6536762 0.50
20 group 2 0.5266668 0.50
21 group 2 0.5732030 0.50
22 group 2 0.5609096 0.50
23 group 2 0.5009987 0.50
24 group 2 0.5885473 0.50
25 group 2 0.6327299 0.50
26 group 2 0.6086359 0.50
27 group 2 0.5022730 0.50
28 group 2 0.5019667 0.50
29 group 2 0.6256001 0.50
30 group 2 0.6741962 0.50
31 group 3 0.9324169 0.75
32 group 3 0.8532473 0.75
33 group 3 0.7542738 0.75
34 group 3 0.7822849 0.75
35 group 3 0.9464182 0.75
36 group 3 0.8915606 0.75
37 group 3 0.7595950 0.75
38 group 3 0.8342477 0.75
39 group 3 0.9632002 0.75
40 group 3 0.7721349 0.75
41 group 3 0.9492902 0.75
42 group 3 0.9480929 0.75
43 group 4 1.2002123 1.00
44 group 4 1.0057918 1.00
45 group 4 1.1210598 1.00
46 group 4 1.0325381 1.00
47 group 4 1.1066508 1.00
48 group 4 1.2251525 1.00
49 group 4 1.2065439 1.00
50 group 4 1.2229266 1.00
51 group 4 1.1485802 1.00
Upvotes: 0
Reputation: 1438
It returns this warning because it is comparing the length-4 threshold vector to each group, rather than comparing the first threshold to the first group, etc.
set.seed(1234)
group <- c(rep("group 1", 30),
rep("group 2", 30),
rep("group 3", 30),
rep("group 4", 30))
number <- c(runif(30, 0, .5), #group 1 data
runif(30, .25, .75), #group 2 data, etc.
runif(30, .5, 1),
runif(30, .75, 1.25))
d <- data.frame(group = group,
number = number)
threshold <- data.frame(group = c("group 1", "group 2", "group 3", "group 4"),
threshold =c(.25, .5, .75, 1))
library(dplyr)
d %>% left_join(threshold, by = 'group') %>%
filter(number >= threshold)
By creating a lookup table and joining to it, we create a new column in d, threshold, which holds the right value for each group. Then, when we apply the filter, each value is compared to the correct threshold. By doing it this way, we don't even need the group_by
!
Upvotes: 3
Reputation: 389275
One way would be to create a dataframe with group
value and threshold
library(dplyr)
compare_df <- data.frame(group = paste('group', 1:4), threshold)
Now you can join this dataframe with d
and filter
d %>%
left_join(compare_df, by = 'group') %>%
filter(number >= threshold)
Same in base R :
subset(merge(d, compare_df, by = 'group'), number >= threshold)
Upvotes: 3