Reputation: 2763
I am trying to create a summarise
/filter
dplyr
pipeline which will be equivalent to the below:
iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
Petal.Area = Petal.Length * Petal.Width) %>%
group_by(Species) %>%
filter(Sepal.Area < 17) %>%
tally() %>%
filter(Sepal.Area > 17 & Sepal.Area < 22) %>%
tally() %>%
filter(Sepal.Area > 22) %>%
tally()
Or another potential approach:
iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
Petal.Area = Petal.Length * Petal.Width) %>%
group_by(Species) %>%
summarise(n(Sepal.Area < 17),
n(Sepal.Area > 17 & Sepal.Area < 22),
n(Sepal.Area > 22))
What is the easiest way to get counts by multiple filters on a grouping? Or simply run each one and join them later?
Upvotes: 2
Views: 1063
Reputation: 805
I think using cut is the right approach. I don't have the reputation to comment on this answer , but you can also use labels.
iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
Petal.Area = Petal.Length * Petal.Width) %>%
mutate(size = cut(Sepal.Area, breaks = c(0, 17, 22, Inf),
labels = c("small", "medium", "large"))) %>%
group_by(size) %>% summarize(count = n())
Upvotes: 1
Reputation: 17299
you can try cut
:
iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
Petal.Area = Petal.Length * Petal.Width,
range = cut(Sepal.Area, breaks = c(0, 17, 22, Inf))) %>%
group_by(Species, range) %>%
summarize(count = n())
# Species range count
# <fctr> <fctr> <int>
# 1 setosa (0,17] 25
# 2 setosa (17,22] 22
# 3 setosa (22,Inf] 3
# 4 versicolor (0,17] 28
# 5 versicolor (17,22] 21
# 6 versicolor (22,Inf] 1
# 7 virginica (0,17] 10
# 8 virginica (17,22] 31
# 9 virginica (22,Inf] 9
Upvotes: 3
Reputation: 3594
You have to create groups for different Sepal.Area ranges you want and then group and count by those. Try this:
iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
Petal.Area = Petal.Length * Petal.Width) %>% mutate(Sepal.Area.Groups = ifelse(Sepal.Area < 17, 'Sep_less_17', ifelse(Sepal.Area > 17 & Sepal.Area < 22, 'Sep_bet_1722', ifelse(Sepal.Area > 22, 'Sep_gre_22', 'other')))) %>%
group_by(Sepal.Area.Groups) %>%
tally()
# A tibble: 4 x 2
Sepal.Area.Groups n
<chr> <int>
1 Sep_bet_1722 74
2 Sep_gre_22 13
3 Sep_less_17 61
4 other 2
Using dplyr, if you apply a filter after doing a tally, you are basically filtering on the tallied table.
Upvotes: 1