Reputation: 28
This is my very first question on stackoverflow. I have a question about creating a bar plot with three categorical variables using R. I am only using R for three weeks, so I hoped you could help me with this problem.
I have a dataframe that summarizes the number of females and males in two places (place1 and place2) per age group. I am interested in the proportions of males and females in both places and per age group for comparison. The data looks as follows:
# Females
data_female <- data.frame(agegroup = c("0-4","5-14","15-24","25-44","45-64","65-74","75-120"),
number_place1 = c(7000, 12000, 15000,40000, 36000, 10000, 13000),
number_place2 = c(163000, 360000, 350000,800000, 900000, 360000, 370000))
# Extra columns
data_female <- data_female %>%
mutate(percentage_place1 = number / sum(number) * 100,
percentage_place2 = number / sum(number) * 100,
gender = "F") %>%
select(agegroup, percentage_place1, percentage_place2, gender)
# Males
data_male <- data.frame(agegroup = c("0-4","5-14","15-24","25-44","45-64","65-74","75-120"),
number_place1 = c(6000, 13000, 13000,38000, 37000, 9000, 12000),
number_place2 = c(161000, 340000, 320000,699000, 900230, 330600, 385000))
# Extra columns
data_male <- data_male %>%
mutate(percentage_place1 = number / sum(number) * 100,
percentage_place2 = number / sum(number) * 100,
gender = "M") %>%
select(agegroup, percentage_place1, percentage_place2, gender)
Both dataframes are then combined into one and 'pivot_longer' is used to create a 'long' dataframe:
data <- rbind(data_females, data_males)
data_long <- data %>%
rename(place1 = percentage_place1, place2 = percentage_place2) %>%
pivot_longer(cols = c("place1","place2"),names_to = "place", values_to = "percentage")
In the end I have a dataframe with following columns:
From this dataframe, I want to create a graph that looks exactly like the figure that can be found here:
It is a bar graph with:
For now, I have a figure with code like this:
ggplot(data_long, aes(x= agegroup, y=percentage, fill=interaction(place,sex))) +
geom_bar(position='dodge', stat='identity') +
facet_wrap( ~ name)
This figure has two larger columns, "place1" and "place2" (because of face_wrap()), but I want to combine them into one column graph as the example figure. Plus, how can I create this nice table underneath the bar graph as in the example?
I hope it is clear what I mean. Is there someone who has experience with creating such figures?
Upvotes: 1
Views: 1295
Reputation: 174348
You can use the "sneaky facets" approach.
First ensure that your categorical variables are in the desired order:
agelevels <- c("0-4", "5-14", "15-24", "25-44", "45-64", "65-74", "75-120")
data_long <- data_long %>% mutate(agegroup = factor(agegroup, agelevels),
gender = factor(gender, c("M", "F")))
Then we plot with gender on the x axis, and fill according to the interaction between sex and place. We then facet by age group along the x axis, removing spacing between the panels and each panel's border. Finally we switch the facet strip position to the bottom (on the outside) and remove its background to make it look like a secondary x axis:
ggplot(data_long, aes(x = gender, y = percentage,
fill = interaction(place, gender))) +
geom_col(position = 'dodge', color = "gray50") +
facet_grid( ~ agegroup, switch = "x") +
scale_fill_manual(values = c("#a8d094", "#9fc0e7", "#97a891", "#95a5c2"),
labels = c("Male, place 1", "Male, place 2",
"Female, place 1", "Female, place 2")) +
labs(fill = "", x = "Age group") +
theme_bw() +
theme(panel.spacing = unit(0, "points"),
panel.border = element_blank(),
axis.line = element_line(),
strip.placement = "outside",
strip.background = element_blank(),
legend.position = "bottom",
panel.grid.major.x = element_blank())
Upvotes: 3