Reputation: 811

data frame selecting top by grouping

I have a data frame such as:

set.seed(1)
df <- data.frame(
  sample = 1:50,
  value = runif(50),
  group = c(rep(NA, 20), gl(3, 10)))

I want to select the top 10 samples based on value. However, if there is a group corresponding to the sample, I only want to include one sample from that group. If group == NA, I want to include all of them. Arranging df by value looks like:

df_top <- df %>% 
  arrange(-value) %>% 
  top_n(10, value)

   sample     value group
1      46 0.7973088     3
2      49 0.8108702     3
3      22 0.8394404     1
4       2 0.8612095    NA
5      27 0.8643395     1
6      20 0.8753213    NA
7      44 0.8762692     3
8      26 0.8921983     1
9      11 0.9128759    NA
10     30 0.9606180     1

I would want to include samples 36, 22, 2, 20, 11, and the next five highest values in my data frame that continue to fit the pattern. How do I accomplish this?

Upvotes: 1

Answers (2)

acylam

Reputation: 18661

Similar method that uses slice instead of filter:

library(dplyr)

df_top <- df %>%
  arrange(-value) %>%
  group_by(group) %>%
  slice(if(any(!is.na(group))) 1 else 1:n()) %>%
  ungroup() %>%
  top_n(10, value)

Result:

# A tibble: 10 x 3
   sample     value group
    <int>     <dbl> <int>
 1     21 0.9347052     1
 2     35 0.8273733     2
 3     41 0.8209463     3
 4     18 0.9919061    NA
 5      7 0.9446753    NA
 6      4 0.9082078    NA
 7      6 0.8983897    NA
 8     20 0.7774452    NA
 9     15 0.7698414    NA
10     17 0.7176185    NA

Upvotes: 0

user42485

Reputation: 811

I think I figured this out. Would this be the best way:

df_top <- df %>% 
  arrange(-value) %>% 
  group_by(group) %>% 
  filter(ifelse(!is.na(group), value == max(value), value == value)) %>% 
  ungroup() %>%
  top_n(10, value)

# A tibble: 10 x 3
   sample value group
    <int> <dbl> <int>
 1     18 0.992    NA
 2      7 0.945    NA
 3     21 0.935     1
 4      4 0.908    NA
 5      6 0.898    NA
 6     35 0.827     2
 7     41 0.821     3
 8     20 0.777    NA
 9     15 0.770    NA
10     17 0.718    NA

Upvotes: 2

data frame selecting top by grouping

Answers (2)

Related Questions