Reputation: 37
I have a dataset with alcohol treatment rates for each state for each year from 2010 to 2015. Five of these states received an intervention and the rest did not. I would like to plot the treatment rates for each intervention state as a separate line and the non-intervention states (grouped as one line using the mean) on the same graph.
I would like to do this using ggplot in R. I have the following code below which graphs the treatment rates for each state independently, however, I am having trouble formatting the grouping variable to meet the condition I described above by including the intervention variable with the state variable. Any help would be appreciated. Thank you in advance!
I'm fairly new to R, so I hope I am explaining this correctly. The dataset is saved as a list, and below is some dummy data showing a snippet of the structure.
year state Intervention rate
2010 Alabama 0 0.006575294
2011 Alabama 0 0.002244153
2012 Alabama 0 0.002519527
2013 Alabama 0 0.00333051
2014 Alabama 0 0.002385317
2015 Alabama 0 0.003080964
2010 Alaska 1 0.00338454
2011 Alaska 1 0.003457992
2012 Alaska 1 0.002784511
2013 Alaska 1 0.00356925
2014 Alaska 1 0.004599099
2015 Alaska 1 0.004204394
2010 Arizona 0 0.002336875
2011 Arizona 0 0.002808161
2012 Arizona 0 0.00299025
2013 Arizona 0 0.0022956
ggplot(data = data, aes(x = year, y = treatment_rate, group= state))+
geom_line()
Upvotes: 1
Views: 2993
Reputation: 37913
Probably the easiest way is to separate the data based on the status of Intervention
. I've generated a somewhat larger dummy dataset that should have a similar shape to the data you provided.
library(ggplot2)
set.seed(1234)
states <- rownames(USArrests)
intervened <- sample(states, 10)
df <- expand.grid(year = 2010:2015, state = states)
df$Intervention <- as.numeric(df$state %in% intervened)
df$rate <- cumsum(rnorm(nrow(df)))
head(df)
#> year state Intervention rate
#> 1 2010 Alabama 0 -0.574740
#> 2 2011 Alabama 0 -1.121372
#> 3 2012 Alabama 0 -1.685824
#> 4 2013 Alabama 0 -2.575862
#> 5 2014 Alabama 0 -3.053054
#> 6 2015 Alabama 0 -4.051441
It's easier to separate the data if you need to handle these seperately while plotting. You can subset the data in the data
argument of a layer. As I understood you wanted to plot states with Intervention == 1
individually, so we do that with the regular geom_line()
. Then, we want to summarize all states with Intervention == 0
and to do that we use the stat_summary()
function. We need to set a common group for the summarised data as we want to summarise over different states.
ggplot(df, aes(x = year, y = rate, group = state)) +
geom_line(
data = ~ subset(., Intervention == 1),
aes(colour = state)
) +
stat_summary(
data = ~ subset(., Intervention == 0),
aes(group = -1),
fun.data = mean_se,
geom = "line", size = 2
)
Created on 2021-02-24 by the reprex package (v1.0.0)
Follow up:
You'd need to repeat the stat_summary()
layer for every geom. For example: adding a ribbon with mean +/- sd values:
stat_summary(
data = ~ subset(., Intervention == 0),
aes(group = -1),
fun.data = function(x) {
mx <- mean(x)
sd <- sd(x)
data.frame(
ymin = mx - sd,
ymax = mx + sd
)
},
geom = "ribbon", alpha = 0.5
)
You can replace "ribbon"
with "errorbar"
if you prefer that.
Upvotes: 2