Looper
Looper

Reputation: 295

Create combined bar plot of multiple variables using ggplot

I am trying to create a frequency (in % terms) bar plot using the following data:

>fulldata
Type Category
Sal         0
Sal         0
Sal         1
Sal         0
Sal         1
Sal         1
Self        1
Self        0
Self        1
Self        0
Self        0

So, I am trying to create a bar plot (using ggplot) which shows both the % of Sal and Self in the fulldata and % of Sal and Self in the Category==1 side by side (with labels of % values). I tried creating a separate data frame by filtering Category==1 from the fulldata but they are getting overlapping over each other. I tried the following:

> Category1 = fulldata[which(fulldata$Category==1),]

ggplot(fulldata, aes(x=Type,y = (..count..)/sum(..count..)))+
    geom_bar()+
    geom_label(stat = "count", aes(label=round(..count../sum(..count..),3)*100), 
               vjust=1.2,size=3, format_string='{:.1f}%')+
    scale_y_continuous(labels = scales::percent)+
    labs(x = "Type", y="Percentage")+
    geom_bar(data = Category1, position = "dodge", color = "red")

*Original data has around 80000 rows.

Upvotes: 1

Views: 106

Answers (1)

dc37
dc37

Reputation: 16178

One possible solution is to start by calculating all proportions out of ggplot2.

Here, a fake example:

df <- data.frame(Type = sample(c("Sal","Self"),100, replace = TRUE),
                 Category = sample(c(0,1),100, replace = TRUE))

We can calculate each proportion as follow to obtain the final dataframe:

library(tidyr)
library(dplyr)

df %>% group_by(Category, Type) %>% count() %>% 
  pivot_wider(names_from = Category, values_from = n) %>%
  mutate(Total = `0`+ `1`) %>%
  pivot_longer(-Type, names_to = "Category", values_to = "n") %>%
  group_by(Category) %>%
  mutate(Percent = n / sum(n))

# A tibble: 6 x 4
# Groups:   Category [3]
  Type  Category     n Percent
  <fct> <chr>    <int>   <dbl>
1 Sal   0           27   0.458
2 Sal   1           22   0.537
3 Sal   Total       49   0.49 
4 Self  0           32   0.542
5 Self  1           19   0.463
6 Self  Total       51   0.51 

Then, if you had the sequence to ggplot2, you can get the barg raph in one single sequence:

df %>% group_by(Category, Type) %>% count() %>% 
  pivot_wider(names_from = Category, values_from = n) %>%
  mutate(Total = `0`+ `1`) %>%
  pivot_longer(-Type, names_to = "Category", values_to = "n") %>%
  group_by(Category) %>%
  mutate(Percent = n / sum(n)) %>%
  ggplot(aes(x = reorder(Category, desc(Category)), y = Percent, fill = Type))+
  geom_col()+
  geom_text(aes(label = scales::percent(Percent)), position = position_stack(0.5))+
  scale_y_continuous(labels = scales::percent)+
  labs(y = "Percentage", x = "Category")

enter image description here

Does it answer your question ?

Upvotes: 1

Related Questions