how to create stacked bar charts for multiple variables with percentages

Question

I am trying to create a stacked bar chart with multiple variables, but I am stuck on two issues:

1) I can't seem to get the rotated y-axis to display percentages instead of counts.

2) I would like to sort the variables (desc) based on the percentage of the "strongly agree" response.

Here is an example of what I have so far:

require(scales)
require(ggplot2)
require(reshape2)

# create data frame
  my.df <- data.frame(replicate(10, sample(1:4, 200, rep=TRUE)))
  my.df$id <- seq(1, 200, by = 1)

# melt
  melted <- melt(my.df, id.vars="id")

# factors
  melted$value <- factor(melted$value, 
                         levels=c(1,2,3,4),
                         labels=c("strongly disagree", 
                                  "disagree", 
                                  "agree", 
                                  "strongly agree"))
# plot
  ggplot(melted) + 
    geom_bar(aes(variable, fill=value, position="fill")) +
    scale_fill_manual(name="Responses",
                      values=c("#EFF3FF", "#BDD7E7", "#6BAED6",
                               "#2171B5"),
                      breaks=c("strongly disagree", 
                               "disagree", 
                               "agree", 
                               "strongly agree"),
                      labels=c("strongly disagree", 
                               "disagree", 
                               "agree", 
                               "strongly agree")) +
    labs(x="Items", y="Percentage (%)", title="my title") +
    coord_flip()

I owe thanks to several folks for help in getting this far. Here are a few of the many pages that Google served up:

http://www.r-bloggers.com/fumblings-with-ranked-likert-scale-data-in-r/

Create stacked barplot where each stack is scaled to sum to 100%

sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reshape2_1.2.2  ggplot2_0.9.2.1 scales_0.2.2   

loaded via a namespace (and not attached):
 [1] colorspace_1.2-0    dichromat_1.2-4     digest_0.6.0        grid_2.15.0         gtable_0.1.1        HH_2.3-23          
 [7] labeling_0.1        lattice_0.20-10     latticeExtra_0.6-24 MASS_7.3-22         memoise_0.1         munsell_0.4        
[13] plyr_1.7.1          proto_0.3-9.2       RColorBrewer_1.0-5  rstudio_0.97.237    stringr_0.6.1       tools_2.15.0

Arun · Accepted Answer

For (1)
To get percentages, you'll have to create a data.frame from melted. At least that's the way I could think of.

# 200 is the total sum always. Using that to get the percentage
require(plyr)
df <- ddply(melted, .(variable, value), function(x) length(x$value)/200 * 100)

Then supply the calculated percentages as weights in geom_bar as follows:

ggplot(df) + 
geom_bar(aes(variable, fill=value, weight=V1, position="fill")) +
scale_fill_manual(name="Responses",
                  values=c("#EFF3FF", "#BDD7E7", "#6BAED6",
                           "#2171B5"),
                  breaks=c("strongly disagree", 
                           "disagree", 
                           "agree", 
                           "strongly agree"),
                  labels=c("strongly disagree", 
                           "disagree", 
                           "agree", 
                           "strongly agree")) +
labs(x="Items", y="Percentage (%)", title="my title") +
coord_flip()

I don't quite understand (2). Do you want to (a) calculate relative percentages (with reference as "strongly agree"? Or (b) do you want always the plot to display "strongly agree", then "agree", etc.. You can accomplish (b) by just reordering factors in df by,

df$value <- factor(df$value, levels=c("strongly agree", "agree", "disagree", 
                 "strongly disagree"), ordered = TRUE)

Edit: You can reorder the levels of variable and value to the order you require as follows:

variable.order <- names(sort(daply(df, .(variable), 
                     function(x) x$V1[x$value == "strongly agree"] ), 
                     decreasing = TRUE))
value.order <- c("strongly agree", "agree", "disagree", "strongly disagree")
df$variable <- factor(df$variable, levels = variable.order, ordered = TRUE)
df$value <- factor(df$value, levels = value.order, ordered = TRUE)

how to create stacked bar charts for multiple variables with percentages

Answers (2)

Related Questions