Reputation: 295
I am trying to create a frequency (in % terms
) bar plot using the following data:
>fulldata
Type Category
Sal 0
Sal 0
Sal 1
Sal 0
Sal 1
Sal 1
Self 1
Self 0
Self 1
Self 0
Self 0
So, I am trying to create a bar plot (using ggplot) which shows both the % of Sal
and Self
in the fulldata and % of Sal
and Self
in the Category==1
side by side (with labels of % values).
I tried creating a separate data frame by filtering Category==1
from the fulldata but they are getting overlapping over each other. I tried the following:
> Category1 = fulldata[which(fulldata$Category==1),]
ggplot(fulldata, aes(x=Type,y = (..count..)/sum(..count..)))+
geom_bar()+
geom_label(stat = "count", aes(label=round(..count../sum(..count..),3)*100),
vjust=1.2,size=3, format_string='{:.1f}%')+
scale_y_continuous(labels = scales::percent)+
labs(x = "Type", y="Percentage")+
geom_bar(data = Category1, position = "dodge", color = "red")
*Original data has around 80000 rows.
Upvotes: 1
Views: 106
Reputation: 16178
One possible solution is to start by calculating all proportions out of ggplot2
.
Here, a fake example:
df <- data.frame(Type = sample(c("Sal","Self"),100, replace = TRUE),
Category = sample(c(0,1),100, replace = TRUE))
We can calculate each proportion as follow to obtain the final dataframe:
library(tidyr)
library(dplyr)
df %>% group_by(Category, Type) %>% count() %>%
pivot_wider(names_from = Category, values_from = n) %>%
mutate(Total = `0`+ `1`) %>%
pivot_longer(-Type, names_to = "Category", values_to = "n") %>%
group_by(Category) %>%
mutate(Percent = n / sum(n))
# A tibble: 6 x 4
# Groups: Category [3]
Type Category n Percent
<fct> <chr> <int> <dbl>
1 Sal 0 27 0.458
2 Sal 1 22 0.537
3 Sal Total 49 0.49
4 Self 0 32 0.542
5 Self 1 19 0.463
6 Self Total 51 0.51
Then, if you had the sequence to ggplot2
, you can get the barg raph in one single sequence:
df %>% group_by(Category, Type) %>% count() %>%
pivot_wider(names_from = Category, values_from = n) %>%
mutate(Total = `0`+ `1`) %>%
pivot_longer(-Type, names_to = "Category", values_to = "n") %>%
group_by(Category) %>%
mutate(Percent = n / sum(n)) %>%
ggplot(aes(x = reorder(Category, desc(Category)), y = Percent, fill = Type))+
geom_col()+
geom_text(aes(label = scales::percent(Percent)), position = position_stack(0.5))+
scale_y_continuous(labels = scales::percent)+
labs(y = "Percentage", x = "Category")
Does it answer your question ?
Upvotes: 1