Select rows based on group_by rows and its column values

Question

I made a reproducible data set.

In this dataset I am trying to get the columns grouped by "value" and "category" and get the largest value of all in the "category" only if there is "value" that is greater than 4 in value in that group_by

The other way to put the question is to Get the largest "value" for each "category" for each label, only if there is a "value" greater than 4 in that "category"

das <- data.frame(val=1:24,
              weigh=c(10,10,10,11,11,11,20,20,20,21,21,21,30,30,30,31,31,31,40,40,40,41,41,41),
              value=c(4.1,3.2,4.3,1.1,2.2,5.3,2.1,2.2,3.3,3.1,8.2,1.3,3.6,2.1,3.1,3.1,3.1,1.1,7.2,4.5,5.1,3.2,2.5,9.1),
              label=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4),
              category=c("A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C"))

val weigh value label category
1   1   10  4.1 1   A
2   2   10  3.2 1   B
3   3   10  4.3 1   C
4   4   11  1.1 1   A
5   5   11  2.2 1   B
6   6   11  5.3 1   C
7   7   20  2.1 2   A
8   8   20  2.2 2   B
9   9   20  3.3 2   C
10  10  21  3.1 2   A
11  11  21  8.2 2   B
12  12  21  1.3 2   C
13  13  30  3.6 3   A
14  14  30  2.1 3   B
15  15  30  3.1 3   C
16  16  31  3.1 3   A
17  17  31  3.1 3   B
18  18  31  1.1 3   C
19  19  40  7.2 4   A
20  20  40  4.5 4   B
21  21  40  5.1 4   C
22  22  41  3.2 4   A
23  23  41  2.5 4   B
24  24  41  9.1 4   C

This is expected output

 val weigh value label category
 1  1   10  4.1 1   A
 5  6   11  5.3 1   C
 2  2   10  3.2 1   B
 10 10  21  3.1 2   A
 3  11  21  8.2 2   B
 9  9   20  3.3 2   C
 2  19  40  7.2 4   A
 4  20  40  4.5 4   B
 6  24  41  9.1 4   C

I tried following, but not getting expected output. Here I am just getting only the values > 4 here, not all the largest number in that category with this label

das1 <- das[das$value >4,]

result <- das1 %>% 
  group_by(category,label) %>% 
  slice(which.max(value))


 val weigh value label category
 1  1   10  4.1 1   A
 5  6   11  5.3 1   C
 3  11  21  8.2 2   B
 2  19  40  7.2 4   A
 4  20  40  4.5 4   B
 6  24  41  9.1 4   C

Ronak Shah · Accepted Answer

We could first group_by label and filter groups which has any value > 4 and then select only row with max value in each label and category.

library(dplyr)

das %>%
   group_by(label) %>%
   filter(any(value > 4)) %>%
   ungroup() %>%
   group_by(label, category) %>%
   slice(which.max(value))


#    val weigh value label category
#         
#1     1    10   4.1     1 A       
#2     2    10   3.2     1 B       
#3     6    11   5.3     1 C       
#4    10    21   3.1     2 A       
#5    11    21   8.2     2 B       
#6     9    20   3.3     2 C       
#7    19    40   7.2     4 A       
#8    20    40   4.5     4 B       
#9    24    41   9.1     4 C

Select rows based on group_by rows and its column values

Answers (2)

Related Questions