Reputation: 7725
I am trying to create a stacked bar chart with multiple variables, but I am stuck on two issues:
1) I can't seem to get the rotated y-axis to display percentages instead of counts.
2) I would like to sort the variables (desc) based on the percentage of the "strongly agree" response.
Here is an example of what I have so far:
require(scales)
require(ggplot2)
require(reshape2)
# create data frame
my.df <- data.frame(replicate(10, sample(1:4, 200, rep=TRUE)))
my.df$id <- seq(1, 200, by = 1)
# melt
melted <- melt(my.df, id.vars="id")
# factors
melted$value <- factor(melted$value,
levels=c(1,2,3,4),
labels=c("strongly disagree",
"disagree",
"agree",
"strongly agree"))
# plot
ggplot(melted) +
geom_bar(aes(variable, fill=value, position="fill")) +
scale_fill_manual(name="Responses",
values=c("#EFF3FF", "#BDD7E7", "#6BAED6",
"#2171B5"),
breaks=c("strongly disagree",
"disagree",
"agree",
"strongly agree"),
labels=c("strongly disagree",
"disagree",
"agree",
"strongly agree")) +
labs(x="Items", y="Percentage (%)", title="my title") +
coord_flip()
I owe thanks to several folks for help in getting this far. Here are a few of the many pages that Google served up:
http://www.r-bloggers.com/fumblings-with-ranked-likert-scale-data-in-r/
Create stacked barplot where each stack is scaled to sum to 100%
sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape2_1.2.2 ggplot2_0.9.2.1 scales_0.2.2
loaded via a namespace (and not attached):
[1] colorspace_1.2-0 dichromat_1.2-4 digest_0.6.0 grid_2.15.0 gtable_0.1.1 HH_2.3-23
[7] labeling_0.1 lattice_0.20-10 latticeExtra_0.6-24 MASS_7.3-22 memoise_0.1 munsell_0.4
[13] plyr_1.7.1 proto_0.3-9.2 RColorBrewer_1.0-5 rstudio_0.97.237 stringr_0.6.1 tools_2.15.0
Upvotes: 2
Views: 3386
Reputation: 10478
Since you are working with Likert data, you might want to consider the likert()
function in package HH. (Hopefully it is ok to point you in another direction given that there is already a nice answer addressing your original ggplot2 approach.)
As one might hope, likert()
plots in a likert-appropriate way with minimal struggle. PositiveOrder=TRUE
will sort the items by how far they extend in the positive direction. The ReferenceZero
argument will allow you to zero-center through the middle of a neutral item (not needed below but shown here). And as.percent=TRUE
will convert counts into percents and list the actual counts in the margin (unless we turn that off).
library(reshape2)
library(HH)
# create data as before
my.df <- data.frame(replicate(10, sample(1:4, 200, rep=TRUE)))
my.df$id <- seq(1, 200, by = 1)
# melt() and dcast() with reshape2 package
melted <- melt(my.df,id.var="id", na.rm=TRUE)
summd <- dcast(data=melted,variable~value, length) # note: length()
# not robust if NAs present
# give names to cols and rows for likert() to use
names(summd) <- c("Question", "strongly disagree",
"disagree",
"agree",
"strongly agree")
rownames(summd) <- summd[,1] # question number as rowname
summd[,1] <- NULL
# plot
likert(summd,
as.percent=TRUE, # automatically scales
main = NULL, # or give "title",
xlab = "Percent", # label axis
positive.order = TRUE, # orders by furthest right
ReferenceZero = 2.5, # zero point btwn levels 2&3
ylab = "Question", # label for left side
auto.key = list(space = "right", columns = 1,
reverse = TRUE)) # make positive items on top of legend
Upvotes: 4
Reputation: 118779
For (1)
To get percentages, you'll have to create a data.frame
from melted
. At least that's the way I could think of.
# 200 is the total sum always. Using that to get the percentage
require(plyr)
df <- ddply(melted, .(variable, value), function(x) length(x$value)/200 * 100)
Then supply the calculated percentages as weights
in geom_bar
as follows:
ggplot(df) +
geom_bar(aes(variable, fill=value, weight=V1, position="fill")) +
scale_fill_manual(name="Responses",
values=c("#EFF3FF", "#BDD7E7", "#6BAED6",
"#2171B5"),
breaks=c("strongly disagree",
"disagree",
"agree",
"strongly agree"),
labels=c("strongly disagree",
"disagree",
"agree",
"strongly agree")) +
labs(x="Items", y="Percentage (%)", title="my title") +
coord_flip()
I don't quite understand (2). Do you want to (a) calculate relative percentages
(with reference as "strongly agree"? Or (b) do you want always the plot to display "strongly agree", then "agree", etc.. You can accomplish (b) by just reordering factors in df by,
df$value <- factor(df$value, levels=c("strongly agree", "agree", "disagree",
"strongly disagree"), ordered = TRUE)
Edit:
You can reorder the levels of variable
and value
to the order you require as follows:
variable.order <- names(sort(daply(df, .(variable),
function(x) x$V1[x$value == "strongly agree"] ),
decreasing = TRUE))
value.order <- c("strongly agree", "agree", "disagree", "strongly disagree")
df$variable <- factor(df$variable, levels = variable.order, ordered = TRUE)
df$value <- factor(df$value, levels = value.order, ordered = TRUE)
Upvotes: 3