Reputation: 69
I have a dataframe which has the columns
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724
2 1880 F Anna 2604 0.0267
3 1880 F Emma 2003 0.0205
4 1880 F Elizabeth 1939 0.0199
5 1880 F Minnie 1746 0.0179
6 1880 F Margaret 1578 0.0162
from the babynames library, and I want to find the percentage a certain name has in each gender. For example, if the name is Anna(a traditionally female name), find out that out of all babies named Anna, how many are male and how many are female.
I know that I have to filter by name, but past that I'm unsure of how to get the percentage. I tried group_by(year) and group_by(gender) and summarize() but I am not getting what I need. I am unsure of whether or not that is even the correct thing to do.
edit: I would like to see it by year(Say, in 1880 x% were F and the rest was male, and in 1882 y% were F) Thank you
Upvotes: 1
Views: 1659
Reputation: 132
Use function table and divide the results by the total number of babies with the desired name, in this case "Ana"
library(babynames)
table(babynames$sex[babynames$name=="Ana"])/sum(babynames$name=="Ana")
Upvotes: 2
Reputation: 388817
You could filter
the name "Anna"
, sum their count by sex
and calculate the ratio.
library(babynames)
library(dplyr)
babynames %>%
filter(name == "Anna") %>%
group_by(sex) %>%
summarise(n = sum(n)) %>%
mutate(n = n/sum(n) * 100)
# sex n
# <chr> <dbl>
#1 F 99.7
#2 M 0.307
Upvotes: 2