sgdata
sgdata

Reputation: 2763

Dplyr tally/count by multiple filters

I am trying to create a summarise/filter dplyr pipeline which will be equivalent to the below:

iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
       Petal.Area = Petal.Length * Petal.Width) %>%
  group_by(Species) %>%
      filter(Sepal.Area < 17) %>%
        tally() %>%
      filter(Sepal.Area > 17 & Sepal.Area < 22) %>%
        tally() %>%
      filter(Sepal.Area > 22) %>%
        tally()

Or another potential approach:

iris %>%
  mutate(Sepal.Area = Sepal.Length * Sepal.Width,
         Petal.Area = Petal.Length * Petal.Width) %>%
  group_by(Species) %>%
    summarise(n(Sepal.Area < 17),
              n(Sepal.Area > 17 & Sepal.Area < 22),
              n(Sepal.Area > 22))

What is the easiest way to get counts by multiple filters on a grouping? Or simply run each one and join them later?

Upvotes: 2

Views: 1063

Answers (3)

rsmith54
rsmith54

Reputation: 805

I think using cut is the right approach. I don't have the reputation to comment on this answer , but you can also use labels.

iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
       Petal.Area = Petal.Length * Petal.Width) %>% 
mutate(size = cut(Sepal.Area, breaks = c(0, 17, 22, Inf), 
                              labels = c("small", "medium", "large"))) %>%
group_by(size) %>% summarize(count = n())

Upvotes: 1

mt1022
mt1022

Reputation: 17299

you can try cut:

iris %>%
    mutate(Sepal.Area = Sepal.Length * Sepal.Width,
           Petal.Area = Petal.Length * Petal.Width,
           range = cut(Sepal.Area, breaks = c(0, 17, 22, Inf))) %>%
    group_by(Species, range) %>%
    summarize(count = n())

#      Species    range count
#       <fctr>   <fctr> <int>
# 1     setosa   (0,17]    25
# 2     setosa  (17,22]    22
# 3     setosa (22,Inf]     3
# 4 versicolor   (0,17]    28
# 5 versicolor  (17,22]    21
# 6 versicolor (22,Inf]     1
# 7  virginica   (0,17]    10
# 8  virginica  (17,22]    31
# 9  virginica (22,Inf]     9

Upvotes: 3

vagabond
vagabond

Reputation: 3594

You have to create groups for different Sepal.Area ranges you want and then group and count by those. Try this:

iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
       Petal.Area = Petal.Length * Petal.Width) %>% mutate(Sepal.Area.Groups = ifelse(Sepal.Area < 17, 'Sep_less_17', ifelse(Sepal.Area > 17 & Sepal.Area < 22, 'Sep_bet_1722', ifelse(Sepal.Area > 22, 'Sep_gre_22', 'other')))) %>% 
  group_by(Sepal.Area.Groups) %>%
        tally()

# A tibble: 4 x 2
  Sepal.Area.Groups     n
              <chr> <int>
1      Sep_bet_1722    74
2        Sep_gre_22    13
3       Sep_less_17    61
4             other     2

Using dplyr, if you apply a filter after doing a tally, you are basically filtering on the tallied table.

Upvotes: 1

Related Questions