How create gaps in smoother for "missing" values (R, ggplot)

Question

If I have a data set like this

set.seed(100)
data <- data.frame("x" = c(1, 1, 1, 2, 2, 2, 3, 4, 4, 4, 5, 5, 5),
                   "y" = rnorm(13),
                   "factor" = c("a","b","c","a","b", "c", "c", "a",
                                "b", "c", "a", "b","c"))

so it looks like this

   x           y factor
1  1 -0.50219235      a
2  1  0.13153117      b
3  1 -0.07891709      c
4  2  0.88678481      a
5  2  0.11697127      b
6  2  0.31863009      c
7  3 -0.58179068      c
8  4  0.71453271      a
9  4 -0.82525943      b
10 4 -0.35986213      c
11 5  0.08988614      a
12 5  0.09627446      b
13 5 -0.20163395      c

I would like to plot this with a separate smoother each factor (a,b,c)

library(ggplot2)
ggplot(data = data, aes(x = x, y = y, col = factor)) + 
  geom_smooth(aes(group = factor))

However since there are no values for "a" and "b" for x = 3, so I would like the smoothers for "a" and "b" to have a break for x = 3. What's the best strategy to accomplish that?

Gavin Simpson · Accepted Answer

I would create an expansion of the combinations of x and factor and then do a database-like join on the combinations and the data. For example, first I form a new data frame df with the combinations of the unique values of x and factor

df <- expand.grid(sapply(data[, c("x", "factor")], unique))

> df
   x factor
1  1      a
2  2      a
3  3      a
4  4      a
5  5      a
6  1      b
7  2      b
8  3      b
9  4      b
10 5      b
11 1      c
12 2      c
13 3      c
14 4      c
15 5      c

Then we can simply perform a join operation on the df and your data, requesting that we return all the rows from the left hand side (the x argument, hence df), and include corresponding values for y from the right hand side (data). Where there is no corresponding right hand side (in data, we will get an NA.

newdf <- merge(df, data, all.x = TRUE)

> newdf
   x factor           y
1  1      a -0.50219235
2  1      b  0.13153117
3  1      c -0.07891709
4  2      a  0.88678481
5  2      b  0.11697127
6  2      c  0.31863009
7  3      a          NA
8  3      b          NA
9  3      c -0.58179068
10 4      a  0.71453271
11 4      b -0.82525943
12 4      c -0.35986213
13 5      a  0.08988614
14 5      b  0.09627446
15 5      c -0.20163395

Now we can fit and predict from a loess model by hand, but this is a little tedious - easier options are available via mgcv:gam()

loessFun <- function(XX, span = 0.85) {
  fit <- loess(y ~ x, data = XX, na.action = na.exclude, span = span)
  predict(fit)
}

Now split the data by factor and apply the loessFun() wrapper

fits <- lapply(split(newdf, newdf$factor), loessFun)
newdf <- transform(newdf, fitted = unsplit(fits, factor))

> head(newdf)
  x factor           y      fitted
1 1      a -0.50219235 -0.50219235
2 1      b  0.13153117  0.13153117
3 1      c -0.07891709 -0.07891709
4 2      a  0.88678481  0.88678481
5 2      b  0.11697127  0.11697127
6 2      c  0.31863009  0.31863009

We can then plot using the new data frame

ggplot(newdf, aes(x = x, y = y, col = factor)) + 
  geom_line(aes(group = factor))

which gives:

enter image description here

It looks a bit funky because of the very low resolution of the sample data you provided and because this method that I've used predicts at the observed data only, preserving NAs. geom_smooth() is actually predicting over the range of x for each group separately and as such there are no missing xs in the data used to draw the geom layer.

Unless you can explain within what region of x = 3 we should add a break (an NA), this may well be the best that you can do. Alternatively, we could predict over the region from the models and then set anything 2.5 < x < 3.5 back to being NA. Add a comment if that is what you wanted and I'll expand my answer with an example of doing that if you can indicate how we are to envisage the gaps.

How create gaps in smoother for "missing" values (R, ggplot)

Answers (1)

Related Questions

How create gaps in smoother for &quot;missing&quot; values (R, ggplot)

Answers (1)

Related Questions

How create gaps in smoother for "missing" values (R, ggplot)