arne
arne

Reputation: 28

R: Creating advanced bar plot with three categorical variables + summarizing table attached at the bottom of figure

This is my very first question on stackoverflow. I have a question about creating a bar plot with three categorical variables using R. I am only using R for three weeks, so I hoped you could help me with this problem.

I have a dataframe that summarizes the number of females and males in two places (place1 and place2) per age group. I am interested in the proportions of males and females in both places and per age group for comparison. The data looks as follows:

# Females
data_female <- data.frame(agegroup = c("0-4","5-14","15-24","25-44","45-64","65-74","75-120"),
                          number_place1 = c(7000, 12000, 15000,40000, 36000, 10000, 13000),
                          number_place2 = c(163000, 360000, 350000,800000, 900000, 360000, 370000))
# Extra columns
data_female <- data_female %>%
               mutate(percentage_place1 = number / sum(number) * 100,
                      percentage_place2 = number / sum(number) * 100,
                      gender = "F") %>%
               select(agegroup, percentage_place1, percentage_place2, gender)

# Males
data_male <- data.frame(agegroup = c("0-4","5-14","15-24","25-44","45-64","65-74","75-120"),
                          number_place1 = c(6000, 13000, 13000,38000, 37000, 9000, 12000),
                          number_place2 = c(161000, 340000, 320000,699000, 900230, 330600, 385000))
# Extra columns
data_male <- data_male %>%
               mutate(percentage_place1 = number / sum(number) * 100,
                      percentage_place2 = number / sum(number) * 100,
                      gender = "M") %>%
               select(agegroup, percentage_place1, percentage_place2, gender)

Both dataframes are then combined into one and 'pivot_longer' is used to create a 'long' dataframe:

data <- rbind(data_females, data_males)

data_long <- data %>%
              rename(place1 = percentage_place1, place2 = percentage_place2) %>%
              pivot_longer(cols = c("place1","place2"),names_to = "place", values_to = "percentage")

In the end I have a dataframe with following columns:

From this dataframe, I want to create a graph that looks exactly like the figure that can be found here:

enter image description here

It is a bar graph with:

For now, I have a figure with code like this:

ggplot(data_long, aes(x= agegroup, y=percentage, fill=interaction(place,sex))) +   
  geom_bar(position='dodge', stat='identity') +
  facet_wrap( ~ name)

This figure has two larger columns, "place1" and "place2" (because of face_wrap()), but I want to combine them into one column graph as the example figure. Plus, how can I create this nice table underneath the bar graph as in the example?

I hope it is clear what I mean. Is there someone who has experience with creating such figures?

Upvotes: 1

Views: 1295

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 174348

You can use the "sneaky facets" approach.

First ensure that your categorical variables are in the desired order:

agelevels <- c("0-4", "5-14", "15-24", "25-44", "45-64", "65-74", "75-120")
data_long <- data_long %>% mutate(agegroup = factor(agegroup, agelevels),
                                  gender = factor(gender, c("M", "F")))

Then we plot with gender on the x axis, and fill according to the interaction between sex and place. We then facet by age group along the x axis, removing spacing between the panels and each panel's border. Finally we switch the facet strip position to the bottom (on the outside) and remove its background to make it look like a secondary x axis:

ggplot(data_long, aes(x = gender, y = percentage, 
                      fill = interaction(place, gender))) +   
  geom_col(position = 'dodge', color = "gray50") +
  facet_grid( ~ agegroup, switch = "x") +
  scale_fill_manual(values = c("#a8d094", "#9fc0e7", "#97a891", "#95a5c2"),
                    labels = c("Male, place 1", "Male, place 2",
                               "Female, place 1", "Female, place 2")) +
  labs(fill = "", x = "Age group") +
  theme_bw() +
  theme(panel.spacing = unit(0, "points"),
        panel.border = element_blank(),
        axis.line = element_line(),
        strip.placement = "outside",
        strip.background = element_blank(),
        legend.position = "bottom",
        panel.grid.major.x = element_blank())

enter image description here

Upvotes: 3

Related Questions