Reputation: 11
I already created a grouped barchart, that shows the number of clients (n=25) grouped by agegroup and gender. But now I need to include all clients (n=52) that were screened for the study to show quality of data. It should look like the attached barchart, but with the total number stacked in a lighter color, so one can see: In Age Group 1, we screened for example 15 females and 9 males, but in the end only 8 females signed up for the study. I hope it is understandable like this. I'll add the existing bar chart.
the code for this chart is the following:
ggplot(data=daten, aes(x=AG, y=stat(count), group=factor(GESCHLECHT), fill=factor(GESCHLECHT)))+
geom_bar(position=position_dodge2(preserve="single", padding=0))+
scale_fill_manual(values = c("steelblue","darkred"), labels=c("Männlich", "Weiblich"))+
labs(y="Anzahl", x="Altersgruppen", fill="Geschlecht:")+
scale_x_continuous(breaks=seq(1,3, by=1), labels=c("25 bis <51","51 bis <65", ">65"))+
scale_y_continuous(limits=c(0, 10))+
geom_text(aes(label=stat(count)),stat="count", vjust=-0.5, position=position_dodge(width = 1))+
theme(legend.position="bottom")
I have no idea how to organize my data so that I could do something like that. In the end it should look like that above the first bar (n=8) the bar should continue, only with for example a lighter color and then lets say with n=6 more females (in the end that would show that we screened 15 females in total in that agegroup)
I think first I would need to create a data frame with all n=52 clients, with a code for Age Group and gender. But how can I divide the bars so that the difference between screened and recruited clients become clear?
Upvotes: 0
Views: 275
Reputation: 10627
Something like this?
library(tidyverse)
set.seed(1)
# example data
n <- 100
data <- tibble(
gender = sample(c("m", "f"), n, replace = TRUE),
age = runif(n, 25, 70) %>% as.integer(),
screened = rep(TRUE, 0.8 * n) %>% c(rep(FALSE, 0.2 * n)) %>% sample()
)
data
#> # A tibble: 100 × 3
#> gender age screened
#> <chr> <int> <lgl>
#> 1 m 54 TRUE
#> 2 f 40 FALSE
#> 3 m 37 FALSE
#> 4 m 69 TRUE
#> 5 f 53 TRUE
#> 6 m 34 TRUE
#> 7 m 30 TRUE
#> 8 m 46 TRUE
#> 9 f 66 TRUE
#> 10 f 51 TRUE
#> # … with 90 more rows
data %>%
mutate(
age_group = age %>% cut(breaks = c(0, 25, 51, 65, Inf))
) %>%
ggplot(aes(x = age_group, fill = gender, alpha = screened)) +
geom_bar() +
scale_alpha_manual(values = c(`TRUE` = 1, `FALSE` = 0.5))
Created on 2022-04-13 by the reprex package (v2.0.0)
Upvotes: 1