Reputation: 21
I'm a beginner at R and am having trouble making a plot that shows different responses (male vs female) as to how they rate their health status("Poor", "Fair", "Good", "Very Good", "Excellent"). The problem is that there are more females than male respondents, so I made a little function to try to mutate the y-axis from count to percentages. Can someone please help? Here is the code:
brfss2013 %>%
filter(!is.na(sex))%>%
count(sex) %>%
mutate(perc = n / nrow(brfss2013)) -> brfss2
brfss2013%>%
filter(!is.na(sex))%>%
filter(!is.na(genhlth))%>%
group_by(sex, genhlth)%>%
ggplot(brfss2013, mapping = aes(x = genhlth) + geom_bar(aes(fill = brfss2$sex),position =
"dodge") + scale_fill_brewer("Gender") + labs(title = "Reported generalhealth - by gender", x =
"general health - reported")
Upvotes: 2
Views: 313
Reputation: 30559
@Tania - welcome to SO!
It is not entirely clear what your final desired plot should look like. But here is one potential way to pursue this. This example is based on BRFSS 2018 data.
First, it appears you would like to remove rows with NA
.
Second, you can filter on those that have the expected SEX1
and GENHLTH
values, to exclude those that answered "not sure" or "refused."
Next, you can group_by
both columns, and compute the percentages. Note the order in your group_by
matters, it will give different results. By the grouping here, I wanted % to add up to 100 across GENHLTH
(for each sex).
Finally, you can plot using the percentage as the vertical axis.
library(tidyverse)
BRFSS_b %>%
drop_na() %>%
filter(SEX1 == 1 | SEX1 == 2,
GENHLTH >= 1 & GENHLTH <=5) %>%
group_by(SEX1, GENHLTH) %>%
summarise(n = n()) %>%
mutate(perc = n*100 / sum(n)) %>%
ggplot(mapping = aes(x = factor(GENHLTH), y = perc, fill = factor(SEX1))) +
geom_bar(stat = "identity", position = position_dodge()) +
scale_fill_brewer("Gender") +
labs(title = "Reported generalhealth - by gender", x = "general health - reported", y = "percent")
Output
The data generated for ggplot
looks like this:
# A tibble: 10 x 4
# Groups: SEX1 [2]
SEX1 GENHLTH n perc
<dbl> <dbl> <int> <dbl>
1 1 1 33272 16.9
2 1 2 63670 32.3
3 1 3 63411 32.2
4 1 4 26554 13.5
5 1 5 9962 5.06
6 2 1 38454 16.1
7 2 2 78260 32.8
8 2 3 74531 31.3
9 2 4 34053 14.3
10 2 5 13057 5.48
Edit 3/23/20:
If you want to plot "counts" instead of percent, you can do the following for ggplot
. You probably need to add to geom_bar
stat="identity"
and make sure your variables are factors (if not already converted).
ggplot(mapping = aes(x = factor(GENHLTH), y = factor(n))) +
geom_bar(stat = "identity", aes(fill = factor(SEX1)), position = "dodge") +
scale_fill_brewer("Gender") +
labs(title = "General health by gender", x = "reported general health")
Upvotes: 1