temo
temo

Reputation: 69

Finding percentage of male and females from dataframe

I have a dataframe which has the columns

year sex   name          n   prop
  <dbl> <chr> <chr>     <int>  <dbl>
1  1880 F     Mary       7065 0.0724
2  1880 F     Anna       2604 0.0267
3  1880 F     Emma       2003 0.0205
4  1880 F     Elizabeth  1939 0.0199
5  1880 F     Minnie     1746 0.0179
6  1880 F     Margaret   1578 0.0162

from the babynames library, and I want to find the percentage a certain name has in each gender. For example, if the name is Anna(a traditionally female name), find out that out of all babies named Anna, how many are male and how many are female.

I know that I have to filter by name, but past that I'm unsure of how to get the percentage. I tried group_by(year) and group_by(gender) and summarize() but I am not getting what I need. I am unsure of whether or not that is even the correct thing to do.

edit: I would like to see it by year(Say, in 1880 x% were F and the rest was male, and in 1882 y% were F) Thank you

Upvotes: 1

Views: 1659

Answers (2)

Federico C
Federico C

Reputation: 132

Use function table and divide the results by the total number of babies with the desired name, in this case "Ana"

library(babynames)
table(babynames$sex[babynames$name=="Ana"])/sum(babynames$name=="Ana")

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388817

You could filter the name "Anna", sum their count by sex and calculate the ratio.

library(babynames)
library(dplyr)

babynames %>%
  filter(name == "Anna") %>%
  group_by(sex) %>%
  summarise(n = sum(n)) %>%
  mutate(n = n/sum(n) * 100)

#   sex    n
#  <chr>  <dbl>
#1 F      99.7  
#2 M      0.307

Upvotes: 2

Related Questions