Reputation: 548
I have two (in reality - more) dataframes:
(edit - the current answer does not answer my question)
sex <- data.frame(sex = c("M", "F")
,n = c(25, 30))
age <- data.frame(age = c("20-40","40-60","60-80")
,n = c(18, 30, 25))
I would like to produce a single stacked bar chart that will show information from both of these dataframes.
The final plot should look something like this:
I imagine I would fist merge these dataframes of unequal lengths , filling the missing rows with NA
s. I am asking about the plot, rather than about merging, because I am hoping that maybe there is a ggplot
solution that does not require merging at all?
EDIT
Following the first answer, I would like to say that one other desired traits for the plot, not explicitly stated until now, is that the bars share the same colour palette, as in the example plot above.
EDIT2
the exact colours are not important to me, and do not have a meaning. However, my real data will have more bars than just two, and I do not want to produce a figure that will have 40 different colours. As you can see in my example plot, I do not display a legend; instead, each of the categories is specified as text on top of the bars. This allows the plots to be clear despite not having a legend, and despite sharing colours.
Upvotes: 0
Views: 230
Reputation: 23757
This is mainly about preparing your data for the plot. This assumes a regular construction of your data frames. If you don't have that, then you need to make sure that they are structured accordingly.
Disclaimer: I cannot endorse this type of visualisation, the below is just about demonstrating one way to get your desired result.
Why do I think this isn't a good idea? Using the same colors within one plot suggests a connection/ relation between variables that may not be given. For example, in your plot, we would be inclined to believe that all subjects that are aged 20-40 are females, and those of 40-60 are all males, because they have the same color. This creates potentially misleading messages and can be dangerous, especially when we are talking about data exploration (what you seem to have in mind, otherwise there should be no need to produce a series of so many graphs).
library(tidyverse)
sex <- data.frame(sex = c("M", "F")
,n = c(25, 30))
age <- data.frame(age = c("20-40","40-60","60-80")
,n = c(18, 30, 25))
## add a "meaningful" mapping variable to color,
## assumptions:
## - your data frames are named like the desired category,
## - they are arranged in the order you would like to have the columns stacked
## - they have two columns
## - one column named after the desired category
## - the other with the values
## first sort your sex column according to your desired output.
sex <- sex[nrow(sex):1,]
my_categ <- c("sex", "age")
map(my_categ, ~{
get(as.name(.x)) %>%
## convert to factor and use levels for fill
mutate(order = as.integer(fct_inorder(.data[[.x]])))
}) %>%
## bind to one data frame
bind_rows() %>%
## make both sex and age one variable
pivot_longer(my_categ) %>%
## remove NAs
drop_na(value) %>%
ggplot() +
## use order as fill
geom_col(aes(n, name, fill = order)) +
## add the labels
geom_text(aes(n, name, label = value),
position = position_stack(vjust = .5))+
theme(legend.position = "none")
#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(my_categ)` instead of `my_categ` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
Created on 2022-06-28 by the reprex package (v2.0.1)
Maybe you meant "they are not stacked to a 100%". This is easily achievable by using position = "fill".
## same data transformation as above piped into the plot... %>%
ggplot() +
geom_col(aes(n, name, fill = order), position = "fill") +
geom_text(aes(n, name, label = value),
position = position_fill(vjust = .5))+
theme(legend.position = "none")
If you want to use color, a less misleading way would be to use one color per category and have a gradient. This can either be achieved by using different mono-hue palettes or simply by adding an alpha (as I am using here) - see my 2nd answer
Upvotes: 1
Reputation: 23757
Another option would be to provide a single color to each category and either use a mono-hue color scale or simply pass an alpha based on the "order". The order is - as in my other answer - defined by the occurrence of the value in the data frame.
The alpha solution is very simple and straight forward, but sometimes an alpha might not be desired because it might interfere with your plot design.
For the mono hue solution and to make it scaleable, I'd create a custom palette of your choice, pick as many colors from it as you have categories, and darken (or lighten) the colors according to the order. You can use the shades or colorspace packages for this, but colorspace's syntax is much simpler, and you're dealing with specific colors rather than palettes, and therefore I'd use this package in this case.
library(tidyverse)
sex <- data.frame(sex = c("M", "F")
,n = c(25, 30))
age <- data.frame(age = c("20-40","40-60","60-80")
,n = c(18, 30, 25))
## first sort your sex column according to your desired output.
sex <- sex[nrow(sex):1,]
my_categ <- c("sex", "age")
## random palette of your choice
my_pal <- c("steelblue", "snow", "tomato", "seagreen")
## get number of colors according to your categories
my_cols <- my_pal[1:length(my_categ)]
## automate this a bit by making a list for each category
df_long <-
map(my_categ, ~{
get(as.name(.x)) %>%
## convert to factor and use levels for fill
mutate(order = as.integer(fct_inorder(.data[[.x]])))
}) %>%
## bind to one data frame
bind_rows() %>%
## make both sex and age one variable
pivot_longer(all_of(my_categ)) %>%
## remove NAs
drop_na(value) %>%
## get darker colors for each step
mutate(color = my_cols[match(name, unique(name))],
darken = colorspace::darken(color, amount = order/10))
## use the dark color with scale_Fill_identitiy
ggplot(df_long) +
## use order as fill
geom_col(aes(n, name, fill = darken), position = "fill") +
scale_fill_identity() +
## add the labels
geom_text(aes(n, name, label = value),
position = position_fill(vjust = .5))
ggplot(df_long) +
## use order as alpha
geom_col(aes(n, name, fill = name, alpha = order), position = "fill") +
## add the labels
geom_text(aes(n, name, label = value),
position = position_fill(vjust = .5))+
## here you need to explicitly remove the legend because you're not using scale_identity
theme(legend.position = "none")
Created on 2022-07-03 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 10627
library(tidyverse)
sex_data <- data.frame(
sex = c("M", "F"),
n = c(25, 30)
)
age_data <- data.frame(
age = c("20-40", "40-60", "60-80"),
n = c(18, 30, 25)
)
ggplot() +
geom_bar(data = sex_data, mapping = aes(x = "sex", fill = sex, y = n/sum(n)), stat = "identity") +
geom_bar(data = age_data, mapping = aes(x = "age", fill = age, y = n/sum(n)), stat = "identity") +
coord_flip() +
labs(x = "", y = "Proportion", fill = "")
Created on 2022-06-28 by the reprex package (v2.0.0)
Upvotes: 0