Reputation: 811
I have a data frame such as:
set.seed(1)
df <- data.frame(
sample = 1:50,
value = runif(50),
group = c(rep(NA, 20), gl(3, 10)))
I want to select the top 10 samples based on value. However, if there is a group corresponding to the sample, I only want to include one sample from that group. If group == NA, I want to include all of them. Arranging df by value looks like:
df_top <- df %>%
arrange(-value) %>%
top_n(10, value)
sample value group
1 46 0.7973088 3
2 49 0.8108702 3
3 22 0.8394404 1
4 2 0.8612095 NA
5 27 0.8643395 1
6 20 0.8753213 NA
7 44 0.8762692 3
8 26 0.8921983 1
9 11 0.9128759 NA
10 30 0.9606180 1
I would want to include samples 36, 22, 2, 20, 11, and the next five highest values in my data frame that continue to fit the pattern. How do I accomplish this?
Upvotes: 1
Views: 55
Reputation: 18661
Similar method that uses slice
instead of filter
:
library(dplyr)
df_top <- df %>%
arrange(-value) %>%
group_by(group) %>%
slice(if(any(!is.na(group))) 1 else 1:n()) %>%
ungroup() %>%
top_n(10, value)
Result:
# A tibble: 10 x 3
sample value group
<int> <dbl> <int>
1 21 0.9347052 1
2 35 0.8273733 2
3 41 0.8209463 3
4 18 0.9919061 NA
5 7 0.9446753 NA
6 4 0.9082078 NA
7 6 0.8983897 NA
8 20 0.7774452 NA
9 15 0.7698414 NA
10 17 0.7176185 NA
Upvotes: 0
Reputation: 811
I think I figured this out. Would this be the best way:
df_top <- df %>%
arrange(-value) %>%
group_by(group) %>%
filter(ifelse(!is.na(group), value == max(value), value == value)) %>%
ungroup() %>%
top_n(10, value)
# A tibble: 10 x 3
sample value group
<int> <dbl> <int>
1 18 0.992 NA
2 7 0.945 NA
3 21 0.935 1
4 4 0.908 NA
5 6 0.898 NA
6 35 0.827 2
7 41 0.821 3
8 20 0.777 NA
9 15 0.770 NA
10 17 0.718 NA
Upvotes: 2