Miguel
Miguel

Reputation: 436

How to combine layers based on different datasets with ggplot2 in R

I'd like to produce a plot that shows a time series followed by a set of forecast distributions, each represented as a violin plot. The example code below creates and plots a time series (as a line plot) and two violin plots separately.

set.seed(12345)
x <- data.frame(time=1:50, dat=rnorm(50))

y1 <- rnorm(500)
y2 <- rnorm(500, sd=5)
y <- data.frame(time=as.factor(c(rep(51,500),rep(52,500))), dat=c(y1,y2)) 

ggplot(x, aes(x=time, y=dat)) +
  geom_line()

ggplot(y, aes(x=time, y=dat)) +
  geom_violin()

How can I combine these into a single chart with a line plot from time points 1 to 50 (along the x axis) followed by the two violin plots at time points 51 and 52, respectively?

Upvotes: 2

Views: 465

Answers (2)

Roman
Roman

Reputation: 4989

You need to convert your y$time factor levels to integer, add a grouping variable, and move your data = ... to the specific geoms. 1

# Transform your factor variable to its factor levels
y$time <- as.integer(levels(y$time))[y$time]

# Plot with grouping for the violin plot (as x is not a factor anymore) 
ggplot() + 
    geom_line(data = x, aes(x = time, y = dat)) + 
    geom_violin(data = y, aes(x = time, y = dat, group = time))

Upvotes: 2

Roman Luštrik
Roman Luštrik

Reputation: 70643

I'm not sure you can plot discrete and continuous variables on the same axis. So you'll have to compromise. Markus has opted to discretize the x variables while I prefer to make y variable into continuous. Notice that I've changed the way y is generated (removed the factor).

library(ggplot2)

set.seed(12345)
x <- data.frame(time=1:50, dat=rnorm(50))

y1 <- rnorm(500)
y2 <- rnorm(500, sd=5)
y <- data.frame(time=c(rep(51, 500), rep(52, 500)), dat=c(y1,y2)) 

ggplot(x, aes(x = time, y = dat)) +
  theme_bw() +
  scale_x_continuous(limits = c(0, 52)) +
  geom_line() + 
  geom_violin(data = y, aes(group = as.factor(time)))

enter image description here

Upvotes: 4

Related Questions