subsetting a dataset using dplyr filter condition

Question

I want to subset a dataframe to collect information from all the columns.

I will explain the question using the msleep dataset.

library(msleep)

I looked at the frequency of the frequency of a column genus to look at the distribution of the frequencies.

msleep %>% count(genus) %>% count(n)
## A tibble: 3 × 2
#      n    nn
#   
#1     1    73
#2     2     2
#3     3     2

I would like to extract all the rows in the main column having value exactly twice.

msleep %>% count(genus) %>% filter(n==2)
## A tibble: 2 × 2
#   genus     n
#    
#1  Equus     2
#2 Vulpes     2

How to achieve the below expected output?

Expected output:

msleep[msleep$genus %in% c('Equus','Vulpes'),]
## A tibble: 4 × 11
#        name  genus  vore          order conservation sleep_total sleep_rem sleep_cycle awake brainwt bodywt
#                                                     
#1      Horse  Equus herbi Perissodactyla domesticated         2.9       0.6        1.00  21.1  0.6550 521.00
#2     Donkey  Equus herbi Perissodactyla domesticated         3.1       0.4          NA  20.9  0.4190 187.00
#3 Arctic fox Vulpes carni      Carnivora                 12.5        NA          NA  11.5  0.0445   3.38
#4    Red fox Vulpes carni      Carnivora                  9.8       2.4        0.35  14.2  0.0504   4.23

Any alternative ways of getting the expected output are also appreciated.

ps: Are there any better ways of looking at the frequency of frequencies? or looking at the filter condition (here:n==2)?

akrun · Accepted Answer

We can use group_by and then filter directly instead of going through the count approach

msleep %>%
      group_by(genus) %>%
      filter(n() ==2)
# A tibble: 4 x 11
# Groups: genus [2]
#        name  genus  vore          order conservation sleep_total sleep_rem sleep_cycle awake brainwt bodywt
#                                                     
#1      Horse  Equus herbi Perissodactyla domesticated         2.9       0.6        1.00  21.1  0.6550 521.00
#2     Donkey  Equus herbi Perissodactyla domesticated         3.1       0.4          NA  20.9  0.4190 187.00
#3 Arctic fox Vulpes carni      Carnivora                 12.5        NA          NA  11.5  0.0445   3.38
#4    Red fox Vulpes carni      Carnivora                  9.8       2.4        0.35  14.2  0.0504   4.23

subsetting a dataset using dplyr filter condition

Answers (1)

Related Questions