Pineapple
Pineapple

Reputation: 203

Get the proportions in ggplot2 (R) bar charts

Can someone provide me some hints as to what I am doing wrong in my code? Or what I need to correct to get the correct percentages? I am trying to get the proportions by manipulating my ggplot2 code. I would prefer not mutating a column. However, if I can't get ggplot2 to give me the correct proportions, I will then be open to adding columns.

Here is the reproduceable data:

cat_type<-c("1", "1","2","3","1","3", "3","2","1","1","1","3","3","2","3","2","3","1","3","3","3","1","3","1","3","1","1","3","1")
country<-c("India","India","India","India","India","India","India","India","India","India","Indonesia","Russia","Indonesia","Russia","Russia","Indonesia","Indonesia","Indonesia","Indonesia","Russia","Indonesia","Russia","Indonesia","Indonesia","Russia", "Russia", "India","India","India")

bigcats<-data.frame(cat_type=cat_type,country=country)

My data gives me the following proportions (these are correct):

> table(bigcats$cat_type, bigcats$country) ## raw numbers
   
    India Indonesia Russia
  1     7         3      2
  2     2         1      1
  3     4         5      4
> 
> 100*round(prop.table(table(bigcats$cat_type, bigcats$country),2),3) ## proportions by column total
   
    India Indonesia Russia
  1  53.8      33.3   28.6
  2  15.4      11.1   14.3
  3  30.8      55.6   57.1

However, my ggplot2 is giving me the incorrect proportions:

bigcats %>% ggplot(aes(x=country, y = prop.table(stat(count)), fill=cat_type, label = scales::percent(prop.table(stat(count)))))+
  geom_bar(position = position_fill())+ 
  geom_text(stat = "count", position = position_fill(vjust=0.5),colour = "white", size = 5)+
  labs(y="Percent",title="Top Big Cat Populations",x="Country")+
  scale_fill_discrete(name=NULL,labels=c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar"))+
  scale_y_continuous(labels = scales::percent)

enter image description here

Upvotes: 1

Views: 1117

Answers (1)

stefan
stefan

Reputation: 123783

The issue is that using prop.table(stat(count)) will not compute the proportions by categories or your countries, i.e. you do:

library(dplyr)

bigcats %>% 
  count(cat_type, country) %>% 
  mutate(pct = scales::percent(prop.table(n)))
#>   cat_type   country n   pct
#> 1        1     India 7 24.1%
#> 2        1 Indonesia 3 10.3%
#> 3        1    Russia 2  6.9%
#> 4        2     India 2  6.9%
#> 5        2 Indonesia 1  3.4%
#> 6        2    Russia 1  3.4%
#> 7        3     India 4 13.8%
#> 8        3 Indonesia 5 17.2%
#> 9        3    Russia 4 13.8%

Making use of a helper function to reduce code duplication you could compute your desired proportions like so:

library(ggplot2)
prop <- function(count, group) {
  count / tapply(count, group, sum)[group]
}

ggplot(bigcats, aes(
  x = country, y = prop(after_stat(count), after_stat(x)),
  fill = cat_type, label = scales::percent(prop(after_stat(count), after_stat(x)))
)) +
  geom_bar(position = position_fill()) +
  geom_text(stat = "count", position = position_fill(vjust = 0.5), colour = "white", size = 5) +
  labs(y = "Percent", title = "Top Big Cat Populations", x = "Country") +
  scale_fill_discrete(name = NULL, labels = c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar")) +
  scale_y_continuous(labels = scales::percent)

Created on 2021-07-28 by the reprex package (v2.0.0)

Upvotes: 2

Related Questions