haimen
haimen

Reputation: 2015

Filter the proportions greater than a number in Dplyr R

Suppose I have the following data,

library(dplyr)
data(mtcars)
mtcars = tbl_dt(mtcars)

I am using the following command,

mtcars %>%
  group_by(am, gear) %>%
  summarise (n = n()) %>%
  mutate(freq = (n / sum(n)) * 100)

I get the following output,

am gear  n     freq
 0    3 15      79
 0    4  4      21
 1    4  8      62
 1    5  5      38

Now I want to filter all the entries corresponding to ones which are less than freq 25. For example, if I give 25 as value, I want to remove all the 4 entries corresponding to proportions less than 25. The output should contain, 28 entries instead for 32. Is it possible to filter out all the entries corresponding to proportions?

Upvotes: 2

Views: 2964

Answers (1)

eipi10
eipi10

Reputation: 93761

You can do this in a single chain if you use mutate, rather than summarise, to count the number of rows in each group.

min.freq = 0.25

mtcars %>%
  group_by(am, gear) %>%
  mutate(n = n()) %>% 
  group_by(am) %>%
  filter(n/n() > min.freq) %>% select(-n)
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
...
26 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
27 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
28 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Upvotes: 5

Related Questions