user10316615
user10316615

Reputation:

Select rows based on group_by rows and its column values

I made a reproducible data set.

In this dataset I am trying to get the columns grouped by "value" and "category" and get the largest value of all in the "category" only if there is "value" that is greater than 4 in value in that group_by

The other way to put the question is to Get the largest "value" for each "category" for each label, only if there is a "value" greater than 4 in that "category"

das <- data.frame(val=1:24,
              weigh=c(10,10,10,11,11,11,20,20,20,21,21,21,30,30,30,31,31,31,40,40,40,41,41,41),
              value=c(4.1,3.2,4.3,1.1,2.2,5.3,2.1,2.2,3.3,3.1,8.2,1.3,3.6,2.1,3.1,3.1,3.1,1.1,7.2,4.5,5.1,3.2,2.5,9.1),
              label=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4),
              category=c("A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C"))

val weigh value label category
1   1   10  4.1 1   A
2   2   10  3.2 1   B
3   3   10  4.3 1   C
4   4   11  1.1 1   A
5   5   11  2.2 1   B
6   6   11  5.3 1   C
7   7   20  2.1 2   A
8   8   20  2.2 2   B
9   9   20  3.3 2   C
10  10  21  3.1 2   A
11  11  21  8.2 2   B
12  12  21  1.3 2   C
13  13  30  3.6 3   A
14  14  30  2.1 3   B
15  15  30  3.1 3   C
16  16  31  3.1 3   A
17  17  31  3.1 3   B
18  18  31  1.1 3   C
19  19  40  7.2 4   A
20  20  40  4.5 4   B
21  21  40  5.1 4   C
22  22  41  3.2 4   A
23  23  41  2.5 4   B
24  24  41  9.1 4   C

This is expected output

 val weigh value label category
 1  1   10  4.1 1   A
 5  6   11  5.3 1   C
 2  2   10  3.2 1   B
 10 10  21  3.1 2   A
 3  11  21  8.2 2   B
 9  9   20  3.3 2   C
 2  19  40  7.2 4   A
 4  20  40  4.5 4   B
 6  24  41  9.1 4   C

I tried following, but not getting expected output. Here I am just getting only the values > 4 here, not all the largest number in that category with this label

das1 <- das[das$value >4,]

result <- das1 %>% 
  group_by(category,label) %>% 
  slice(which.max(value))


 val weigh value label category
 1  1   10  4.1 1   A
 5  6   11  5.3 1   C
 3  11  21  8.2 2   B
 2  19  40  7.2 4   A
 4  20  40  4.5 4   B
 6  24  41  9.1 4   C

Upvotes: 2

Views: 1429

Answers (2)

Calum You
Calum You

Reputation: 15062

I think your worded description is confusing, because you keep saying different things. This matches your expected output, and the interpretation is

Get the largest "value" for each "category" for each label, only if there is a "value" greater than 4 in that "label" (here you said category in the OP)

library(tidyverse)
das <- data.frame(
  val = 1:24,
  weigh = c(10, 10, 10, 11, 11, 11, 20, 20, 20, 21, 21, 21, 30, 30, 30, 31, 31, 31, 40, 40, 40, 41, 41, 41),
  value = c(4.1, 3.2, 4.3, 1.1, 2.2, 5.3, 2.1, 2.2, 3.3, 3.1, 8.2, 1.3, 3.6, 2.1, 3.1, 3.1, 3.1, 1.1, 7.2, 4.5, 5.1, 3.2, 2.5, 9.1),
  label = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4),
  category = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C")
)

das %>%
  group_by(label) %>%
  filter(any(value > 4)) %>%
  group_by(label, category) %>%
  filter(value == max(value)) %>%
  arrange(label, category)
#> # A tibble: 9 x 5
#> # Groups:   label, category [9]
#>     val weigh value label category
#>   <int> <dbl> <dbl> <dbl> <fct>   
#> 1     1    10   4.1     1 A       
#> 2     2    10   3.2     1 B       
#> 3     6    11   5.3     1 C       
#> 4    10    21   3.1     2 A       
#> 5    11    21   8.2     2 B       
#> 6     9    20   3.3     2 C       
#> 7    19    40   7.2     4 A       
#> 8    20    40   4.5     4 B       
#> 9    24    41   9.1     4 C

Created on 2019-03-07 by the reprex package (v0.2.1)

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388817

We could first group_by label and filter groups which has any value > 4 and then select only row with max value in each label and category.

library(dplyr)

das %>%
   group_by(label) %>%
   filter(any(value > 4)) %>%
   ungroup() %>%
   group_by(label, category) %>%
   slice(which.max(value))


#    val weigh value label category
#  <int> <dbl> <dbl> <dbl> <fct>   
#1     1    10   4.1     1 A       
#2     2    10   3.2     1 B       
#3     6    11   5.3     1 C       
#4    10    21   3.1     2 A       
#5    11    21   8.2     2 B       
#6     9    20   3.3     2 C       
#7    19    40   7.2     4 A       
#8    20    40   4.5     4 B       
#9    24    41   9.1     4 C       

Upvotes: 3

Related Questions