Reputation: 1365
I have a data set which has a some big groups, and subgroups (small groups).
I want to select small group 1 for each big group. But, if small group one doesn't exist in big group, select subgroup 2. My example below stops here, but ideally this would continue to work, so if subgroup 2 is not found, select subgroup 3. etc. In the example I use numbers but my focus is on doing this with factor levels.
Is this possible with factors in dplyr? assuming the factor levels are ordered in terms of importance?
Here is my example data:
set.seed(123)
big_group = rep(1:3, each = 6)
small_group = c(sample(1:2, size = 6, replace = TRUE),
rep(1, each = 6),
rep(2, each = 6)) %>%
as.factor()
d = data.frame(big_group,
small_group,
value = runif(n = 3 * 6))
And the ideal output would be
big_group small_group values
1 1 0.52810549
2 1 0.67757064
3 2 0.32792072
Upvotes: 2
Views: 98
Reputation: 11878
Combining both answers from @akrun and @KarolisKoncevičius you could also just do:
d %>%
group_by(big_group) %>%
slice(which.min(small_group))
#> # A tibble: 3 x 3
#> # Groups: big_group [3]
#> big_group small_group value
#> <int> <fct> <dbl>
#> 1 1 1 0.528
#> 2 2 1 0.678
#> 3 3 2 0.328
Upvotes: 2
Reputation: 887118
We group by 'big_group', filter
the rows having the min
value for 'small_group', and then slice
the first row
d %>%
group_by(big_group) %>%
filter(as.numeric(small_group) == min(as.numeric(small_group))) %>%
slice(row_number()==1)
# A tibble: 3 x 3
# Groups: big_group [3]
# big_group small_group value
# <int> <fctr> <dbl>
#1 1 1 0.528
#2 2 1 0.678
#3 3 2 0.328
Or use match
with slice
d %>%
group_by(big_group) %>%
slice(match(levels(droplevels(small_group))[1], levels(droplevels(small_group))))
Upvotes: 2
Reputation: 9656
Not a dplyr
solution but in R you can do:
do.call(rbind, by(d, d$big_group, function(x) x[which.min(d$small_group),]))
# big_group small_group value
# 1 1 1 0.5281055
# 2 2 1 0.6775706
# 3 3 2 0.3279207
Upvotes: 2