Reputation:
I made a reproducible data set.
In this dataset I am trying to get the columns grouped by "value" and "category" and get the largest value of all in the "category" only if there is "value" that is greater than 4 in value in that group_by
The other way to put the question is to Get the largest "value" for each "category" for each label, only if there is a "value" greater than 4 in that "category"
das <- data.frame(val=1:24,
weigh=c(10,10,10,11,11,11,20,20,20,21,21,21,30,30,30,31,31,31,40,40,40,41,41,41),
value=c(4.1,3.2,4.3,1.1,2.2,5.3,2.1,2.2,3.3,3.1,8.2,1.3,3.6,2.1,3.1,3.1,3.1,1.1,7.2,4.5,5.1,3.2,2.5,9.1),
label=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4),
category=c("A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C"))
val weigh value label category
1 1 10 4.1 1 A
2 2 10 3.2 1 B
3 3 10 4.3 1 C
4 4 11 1.1 1 A
5 5 11 2.2 1 B
6 6 11 5.3 1 C
7 7 20 2.1 2 A
8 8 20 2.2 2 B
9 9 20 3.3 2 C
10 10 21 3.1 2 A
11 11 21 8.2 2 B
12 12 21 1.3 2 C
13 13 30 3.6 3 A
14 14 30 2.1 3 B
15 15 30 3.1 3 C
16 16 31 3.1 3 A
17 17 31 3.1 3 B
18 18 31 1.1 3 C
19 19 40 7.2 4 A
20 20 40 4.5 4 B
21 21 40 5.1 4 C
22 22 41 3.2 4 A
23 23 41 2.5 4 B
24 24 41 9.1 4 C
This is expected output
val weigh value label category
1 1 10 4.1 1 A
5 6 11 5.3 1 C
2 2 10 3.2 1 B
10 10 21 3.1 2 A
3 11 21 8.2 2 B
9 9 20 3.3 2 C
2 19 40 7.2 4 A
4 20 40 4.5 4 B
6 24 41 9.1 4 C
I tried following, but not getting expected output. Here I am just getting only the values > 4 here, not all the largest number in that category with this label
das1 <- das[das$value >4,]
result <- das1 %>%
group_by(category,label) %>%
slice(which.max(value))
val weigh value label category
1 1 10 4.1 1 A
5 6 11 5.3 1 C
3 11 21 8.2 2 B
2 19 40 7.2 4 A
4 20 40 4.5 4 B
6 24 41 9.1 4 C
Upvotes: 2
Views: 1429
Reputation: 15062
I think your worded description is confusing, because you keep saying different things. This matches your expected output, and the interpretation is
Get the largest "value" for each "category" for each label, only if there is a "value" greater than 4 in that "label" (here you said category in the OP)
library(tidyverse)
das <- data.frame(
val = 1:24,
weigh = c(10, 10, 10, 11, 11, 11, 20, 20, 20, 21, 21, 21, 30, 30, 30, 31, 31, 31, 40, 40, 40, 41, 41, 41),
value = c(4.1, 3.2, 4.3, 1.1, 2.2, 5.3, 2.1, 2.2, 3.3, 3.1, 8.2, 1.3, 3.6, 2.1, 3.1, 3.1, 3.1, 1.1, 7.2, 4.5, 5.1, 3.2, 2.5, 9.1),
label = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4),
category = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C")
)
das %>%
group_by(label) %>%
filter(any(value > 4)) %>%
group_by(label, category) %>%
filter(value == max(value)) %>%
arrange(label, category)
#> # A tibble: 9 x 5
#> # Groups: label, category [9]
#> val weigh value label category
#> <int> <dbl> <dbl> <dbl> <fct>
#> 1 1 10 4.1 1 A
#> 2 2 10 3.2 1 B
#> 3 6 11 5.3 1 C
#> 4 10 21 3.1 2 A
#> 5 11 21 8.2 2 B
#> 6 9 20 3.3 2 C
#> 7 19 40 7.2 4 A
#> 8 20 40 4.5 4 B
#> 9 24 41 9.1 4 C
Created on 2019-03-07 by the reprex package (v0.2.1)
Upvotes: 2
Reputation: 388817
We could first group_by
label
and filter
groups which has any
value > 4
and then select only row with max
value
in each label
and category
.
library(dplyr)
das %>%
group_by(label) %>%
filter(any(value > 4)) %>%
ungroup() %>%
group_by(label, category) %>%
slice(which.max(value))
# val weigh value label category
# <int> <dbl> <dbl> <dbl> <fct>
#1 1 10 4.1 1 A
#2 2 10 3.2 1 B
#3 6 11 5.3 1 C
#4 10 21 3.1 2 A
#5 11 21 8.2 2 B
#6 9 20 3.3 2 C
#7 19 40 7.2 4 A
#8 20 40 4.5 4 B
#9 24 41 9.1 4 C
Upvotes: 3