Reputation: 5696
I want to subset a dataframe to collect information from all the columns.
I will explain the question using the msleep
dataset.
library(msleep)
I looked at the frequency of the frequency of a column genus
to look at the distribution of the frequencies.
msleep %>% count(genus) %>% count(n)
## A tibble: 3 × 2
# n nn
# <int> <int>
#1 1 73
#2 2 2
#3 3 2
I would like to extract all the rows in the main column having value exactly twice.
msleep %>% count(genus) %>% filter(n==2)
## A tibble: 2 × 2
# genus n
# <chr> <int>
#1 Equus 2
#2 Vulpes 2
How to achieve the below expected output?
Expected output:
msleep[msleep$genus %in% c('Equus','Vulpes'),]
## A tibble: 4 × 11
# name genus vore order conservation sleep_total sleep_rem sleep_cycle awake brainwt bodywt
# <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Horse Equus herbi Perissodactyla domesticated 2.9 0.6 1.00 21.1 0.6550 521.00
#2 Donkey Equus herbi Perissodactyla domesticated 3.1 0.4 NA 20.9 0.4190 187.00
#3 Arctic fox Vulpes carni Carnivora <NA> 12.5 NA NA 11.5 0.0445 3.38
#4 Red fox Vulpes carni Carnivora <NA> 9.8 2.4 0.35 14.2 0.0504 4.23
Any alternative ways of getting the expected output are also appreciated.
ps: Are there any better ways of looking at the frequency of frequencies? or looking at the filter condition (here:n==2)?
Upvotes: 1
Views: 115
Reputation: 887108
We can use group_by
and then filter
directly instead of going through the count
approach
msleep %>%
group_by(genus) %>%
filter(n() ==2)
# A tibble: 4 x 11
# Groups: genus [2]
# name genus vore order conservation sleep_total sleep_rem sleep_cycle awake brainwt bodywt
# <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Horse Equus herbi Perissodactyla domesticated 2.9 0.6 1.00 21.1 0.6550 521.00
#2 Donkey Equus herbi Perissodactyla domesticated 3.1 0.4 NA 20.9 0.4190 187.00
#3 Arctic fox Vulpes carni Carnivora <NA> 12.5 NA NA 11.5 0.0445 3.38
#4 Red fox Vulpes carni Carnivora <NA> 9.8 2.4 0.35 14.2 0.0504 4.23
Upvotes: 1