Reputation: 183
Consider a dataset like below
Col1 Col2
A BOY
B GIRL
A BOY
B BOY
A BOY
B GIRL
Both columns are categorical variables. I want to make a grouped bar chart for both variables that shows the Y axis as the proportion using position="fill"
How do I do that ?
This is what I have
ggplot(aboveData, aes(x =col1, fill = col2)) + geom_bar(position = "fill")
This comes up as a stacked bar graph. I want grouped.
Upvotes: 2
Views: 2334
Reputation: 47008
We first tally the counts:
library(dplyr)
library(ggplot2)
df = structure(list(Col1 = structure(c(1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A",
"B"), class = "factor"), Col2 = structure(c(1L, 2L, 1L, 1L, 1L,
2L), .Label = c("BOY", "GIRL"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
tab <- df %>% group_by(Col1,Col2,.drop=FALSE) %>% tally()
It's not clear what you mean by proportion. If it is proportion within the X variable (as commonly plotted), then:
tab %>% mutate(perc=n/sum(n)) %>%
ggplot() + geom_col(aes(x=Col1,y=perc,fill=Col2),position="dodge") +
scale_y_continuous(labels =scales::percent)
If you meant proportion of everything, then:
tab %>% ungroup() %>%
mutate(perc=n/sum(n)) %>%
ggplot() + geom_col(aes(x=Col1,y=perc,fill=Col2),position="dodge") +
scale_y_continuous(labels =scales::percent)
Upvotes: 4
Reputation: 5640
It might be easier to work with ggplot
using data in a long format (instead of wide) and calculate the proportion of each level (A, B, Boy, Girl) for each variable (Col1, Col2).
#Your data
df<-data.frame(Col1 = rep(c("A","B"),3),
Col2 = c("BOY","GIRL","BOY","BOY","BOY","GIRL"))
df1<-df %>%
#Change to long format
pivot_longer(cols = c(Col1,Col2),
names_to = "var") %>%
group_by(value, var) %>%
#Get the frequencies of A, B, Boy and Girl
count() %>%
ungroup() %>%
#Group by var, which now has level Col1 and Col2
group_by(var) %>%
#Calculate proportion
mutate(perc = n / sum (n))
ggplot(df1, aes(x = var,
y = perc,
fill = value)) +
geom_col(position = "dodge")
Upvotes: 1