Tania Pescarini
Tania Pescarini

Reputation: 21

ggplot y axis - mutating count to percentages

I'm a beginner at R and am having trouble making a plot that shows different responses (male vs female) as to how they rate their health status("Poor", "Fair", "Good", "Very Good", "Excellent"). The problem is that there are more females than male respondents, so I made a little function to try to mutate the y-axis from count to percentages. Can someone please help? Here is the code:

brfss2013 %>% 
filter(!is.na(sex))%>%
count(sex) %>% 
mutate(perc = n / nrow(brfss2013)) -> brfss2

brfss2013%>%
filter(!is.na(sex))%>% 
filter(!is.na(genhlth))%>% 
group_by(sex, genhlth)%>%  
ggplot(brfss2013, mapping = aes(x = genhlth) + geom_bar(aes(fill = brfss2$sex),position = 
"dodge") + scale_fill_brewer("Gender") + labs(title = "Reported generalhealth - by gender", x = 
"general health - reported")

Upvotes: 2

Views: 313

Answers (1)

Ben
Ben

Reputation: 30559

@Tania - welcome to SO!

It is not entirely clear what your final desired plot should look like. But here is one potential way to pursue this. This example is based on BRFSS 2018 data.

First, it appears you would like to remove rows with NA.

Second, you can filter on those that have the expected SEX1 and GENHLTH values, to exclude those that answered "not sure" or "refused."

Next, you can group_by both columns, and compute the percentages. Note the order in your group_by matters, it will give different results. By the grouping here, I wanted % to add up to 100 across GENHLTH (for each sex).

Finally, you can plot using the percentage as the vertical axis.

library(tidyverse)

BRFSS_b %>%
  drop_na() %>%
  filter(SEX1 == 1 | SEX1 == 2,
         GENHLTH >= 1 & GENHLTH <=5) %>%
  group_by(SEX1, GENHLTH) %>%
  summarise(n = n()) %>%
  mutate(perc = n*100 / sum(n)) %>%
  ggplot(mapping = aes(x = factor(GENHLTH), y = perc, fill = factor(SEX1))) + 
           geom_bar(stat = "identity", position = position_dodge()) + 
           scale_fill_brewer("Gender") + 
           labs(title = "Reported generalhealth - by gender", x = "general health - reported", y = "percent")

Output

plot of percentage vs gen health by sex

The data generated for ggplot looks like this:

# A tibble: 10 x 4
# Groups:   SEX1 [2]
    SEX1 GENHLTH     n  perc
   <dbl>   <dbl> <int> <dbl>
 1     1       1 33272 16.9 
 2     1       2 63670 32.3 
 3     1       3 63411 32.2 
 4     1       4 26554 13.5 
 5     1       5  9962  5.06
 6     2       1 38454 16.1 
 7     2       2 78260 32.8 
 8     2       3 74531 31.3 
 9     2       4 34053 14.3 
10     2       5 13057  5.48

Edit 3/23/20:

If you want to plot "counts" instead of percent, you can do the following for ggplot. You probably need to add to geom_bar stat="identity" and make sure your variables are factors (if not already converted).

ggplot(mapping = aes(x = factor(GENHLTH), y = factor(n))) + 
  geom_bar(stat = "identity", aes(fill = factor(SEX1)), position = "dodge") + 
  scale_fill_brewer("Gender") + 
  labs(title = "General health by gender", x = "reported general health")

Upvotes: 1

Related Questions