Reputation: 69
I'm using the babynames package to find out when a certain name(like Alex) was closest to having an equal number of male and female babies that had that name.
I currently have but I'm not sure what math needs to be done to find out when this name was most unisex, since it probably wasn't a perfect 50/50.
Alex <- babynames %>%
filter(name == "Alex", year >=1920) %>%
group_by(year, sex) %>%
summarise(n = sum(n)) %>%
mutate(n = n/sum(n) * 100)
Thank you.
Upvotes: 0
Views: 51
Reputation: 6483
Graphically:
library(babynames)
library(dplyr)
library(ggplot2)
babynames %>%
filter(name == "Alex", year >=1920) %>%
ggplot(aes(year, n, color=sex)) +
geom_line()
Numerically:
library(tidyr)
babynames %>%
filter(name == "Alex", year >=1920) %>%
group_by(year) %>%
mutate(pct = n / sum(n, na.rm = TRUE)) %>%
ungroup() %>%
select(year, name, pct, sex) %>%
pivot_wider(names_from = sex, values_from = pct) %>%
mutate(diff = abs(F - M)) %>%
arrange(diff)
For all names:
babynames %>%
filter(year >=1920) %>%
group_by(name, year) %>%
mutate(pct = n / sum(n, na.rm = TRUE),
total = sum(n)) %>%
ungroup() %>%
select(year, name, total, pct, sex) %>%
pivot_wider(names_from = sex, values_from = pct) %>%
mutate(diff = abs(F - M)) %>%
arrange(diff)
Not sure about this data set though ;)
babynames %>%
filter(name == "Othello", year ==1920)
year sex name n prop <dbl> <chr> <chr> <int> <dbl> 1 1920 F Othello 8 0.00000643 2 1920 M Othello 8 0.00000727
Upvotes: 3