Jonah Davids
Jonah Davids

Reputation: 41

ggplot Graph with Standard Deviation fill

I am trying to create a graph where X is a dichotomous categorical variable, Y is a continuous variable, and the 'fill' or 'grouping' variable is +1 and -1 SD of a quantitative variable. Here is an example of what I'm trying to achieve:

Note: The fill here (liberals -1SD vs. conservatives +1SD) are derived from a political orientation variable where low scores = more liberal and high scores = more conservative.

I've tried to create my own version of this graph, but am running into problems. Here is an example with reproducible code that shows where I am at currently:

library(tidyverse, Rmisc)

diamonds <- diamonds
diamonds <- diamonds %>% filter(cut == "Premium" | cut == "Fair")
diamondsgraph <- Rmisc::summarySE(diamonds, measurevar="carat", groupvars=c("cut","price"))

ggplot(diamondsgraph, aes(x=cut, y=carat, fill=cut(scale(price), breaks = c(-Inf, -1, 1, Inf)))) +
  geom_col(position=position_dodge(.9), colour="black") +
  geom_errorbar(position=position_dodge(.9), width=.25, aes(ymin=price-se, ymax=price+se)) +
  coord_cartesian(ylim=c(1,5)) +
  theme_bw() + ylab("carat") + labs(fill="price")
  ggtitle("diamonds Data Example") + scale_fill_discrete(
    limits = c("(-Inf,-1]", "(1, Inf]"),
    labels = c("-1 SD", "+ 1SD")) 

enter image description here

In this case, x = cut(fair vs. premium), y = carat, and the fill = price.

I am running into three problems:

  1. The green columns should not be on the graph, only the red and blue.
  2. The fill label should just be +1 SD and -1 SD.
  3. There should only be one standard error bar per column.

Any help would be greatly appreciated!

Upvotes: 1

Views: 449

Answers (1)

StupidWolf
StupidWolf

Reputation: 46948

In your example, to separate the groups into +1SD and -1SD, you should scale the data first, separate into the 2 labels then plot. You are calculating the mean and then scaling it, which doesn't make sense. The SE can be calculated on the fly.

So using the same dataset, there are no values of price < -1 SD, so we use 0.5 SD, you just change the labels accordingly:

SDcut = 0.5

diamondsgraph <- diamonds %>% 
filter(cut == "Premium" | cut == "Fair") %>%
mutate(price = c(scale(price))) %>%
filter(abs(price)> SDcut ) %>%
mutate(label = ifelse(price > 0,paste("+",SDcut,"SD"),paste("-",SDcut,"SD")))

Then plot:

ggplot(diamondsgraph,aes(x = cut,y=carat,fill=label)) + 
stat_summary(geom = "bar",fun="mean",position=position_dodge(1)) +
stat_summary(geom = "errorbar", position = position_dodge(1),width=0.6)

enter image description here

Upvotes: 2

Related Questions