user2460499
user2460499

Reputation: 151

ggplot fill variable to add to 100%

Here is a dataframe

DF <- data.frame(SchoolYear = c("2015-2016", "2016-2017"), 
                 Value = sample(c('Agree', 'Disagree', 'Strongly agree', 'Strongly disagree'), 50, replace = TRUE))

I have created this graph.

ggplot(DF, aes(x = Value, fill = SchoolYear)) +
       geom_bar(position = 'dodge', aes(y = (..count..)/sum(..count..))) +   
       geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))), 
                 stat = "count", vjust = -0.25, size = 2, position = position_dodge(width = 0.9)) + 
       scale_y_continuous(labels = percent) + 
       ylab("Percent") + xlab("Response") +   
       theme(axis.text.x = element_text(angle = 75, hjust = 1))

enter image description here

Is there a way to make the data for each school year add up to 100%, but not have the data stacked, in the graph?

I know this question is similar to this question Create stacked barplot where each stack is scaled to sum to 100%, but I don't want the graph to be stacked. I can't figure out how to apply the solution in my question to this situation. Also I would prefer not to summarize the data before graphing, as I have to make this graph many times using different data each time and would prefer not to have to summarize the data each time.

Upvotes: 0

Views: 1000

Answers (2)

Mike H.
Mike H.

Reputation: 14370

Big Disclaimer: I would highly recommend you summarize your data before hand and not try to do these calculations within ggplot. That is not what ggplot is meant to do. Furthermore, it not only complicates your code unnecessarily, but can easily introduce bugs/unintended results.

Given that, it appears that what you want is doable (without summarizing first). A very hacky way to get what you want by doing the calculations within ggplot would be:

#Store factor values
fac <- unique(DF$SchoolYear)

ggplot(DF, aes(x = Value, fill = SchoolYear)) +
  geom_bar(position = 'dodge', aes(y = (..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum))) +   
  geom_text(aes(y = (..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum), label = scales::percent((..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum))),
            stat = "count", vjust = -0.25, size = 2, position = position_dodge(width = 0.9)) + 
  scale_y_continuous(labels = percent) + 
  ylab("Percent") + xlab("Response") +   
  theme(axis.text.x = element_text(angle = 75, hjust = 1))

enter image description here

This takes the ..count.. variable and divides it by the sum within it's respective group using stats::ave. Note this can be messed up extremely easily.

Finally, we check to see the plot is in fact giving us what we want.

#Check to see we have the correct values
d2 <- DF
d2 <- setDT(d2)[, .(count = .N), by = .(SchoolYear, Value)][, percent := count/sum(count), by = SchoolYear]

Upvotes: 1

LucyMLi
LucyMLi

Reputation: 657

I'm not sure how to create the plot that you want without transforming the data. But if you want to re-use the same code for multiple datasets, you can write a function to transform your data and generate the plot at the same time:

plot.fun <- function (original.data) {
    newDF <- reshape2::melt(apply(table(original.data), 1, prop.table))
    Plot <- ggplot(newDF, aes(x=Value, y=value)) + 
            geom_bar(aes(fill=SchoolYear), stat="identity", position="dodge") +
            geom_text(aes(group=SchoolYear, label=scales::percent(value)), stat="identity", vjust=-0.25, size=2, position=position_dodge(width=0.85)) + 
            scale_y_continuous(labels=scales::percent) + 
            ylab("Percent") + xlab("Response") +   
            theme(axis.text.x = element_text(angle = 75, hjust = 1))
    return (Plot)
}
plot.fun(DF)

Upvotes: 1

Related Questions