Reputation: 143
In the problem here, I have a data set of popular baby names going back to 1880. I am trying to find the timelessly popular baby names, meaning the 30 most common names for its gender in every year of my data.
I have tried using group_by, top_n, and filter, but just am not very well verse with the program yet, so unsure how the proper order and thinking goes here.
library(babynames)
timeless <- babynames %>% group_by(name, sex, year) %>% top_n(30) %>% filter()
I am getting a large data table back with the 30 most common names for each year of data, but I want to compare that to find the most common names in every year. My prof hinted that there should be four timeless boy names, and one timeless girl name. Any help is appreciated!
Upvotes: 2
Views: 153
Reputation: 39174
Here is the answer.
library(babynames)
library(dplyr)
timeless <- babynames %>%
group_by(sex, year) %>%
top_n(30) %>%
ungroup() %>%
count(sex, name) %>%
filter(n == max(babynames$year) - min(babynames$year) + 1)
timeless
# # A tibble: 5 x 3
# sex name n
# <chr> <chr> <int>
# 1 F Elizabeth 138
# 2 M James 138
# 3 M John 138
# 4 M Joseph 138
# 5 M William 138
Regarding your original code, group_by(name, sex, year) %>% top_n(30)
does not make sense as all combination of name
, sex
, and year
are unique, thus nothing for you to filer the "top 30".
Upvotes: 1