Reputation: 159
If I have a data set like this
set.seed(100)
data <- data.frame("x" = c(1, 1, 1, 2, 2, 2, 3, 4, 4, 4, 5, 5, 5),
"y" = rnorm(13),
"factor" = c("a","b","c","a","b", "c", "c", "a",
"b", "c", "a", "b","c"))
so it looks like this
x y factor
1 1 -0.50219235 a
2 1 0.13153117 b
3 1 -0.07891709 c
4 2 0.88678481 a
5 2 0.11697127 b
6 2 0.31863009 c
7 3 -0.58179068 c
8 4 0.71453271 a
9 4 -0.82525943 b
10 4 -0.35986213 c
11 5 0.08988614 a
12 5 0.09627446 b
13 5 -0.20163395 c
I would like to plot this with a separate smoother each factor (a,b,c)
library(ggplot2)
ggplot(data = data, aes(x = x, y = y, col = factor)) +
geom_smooth(aes(group = factor))
However since there are no values for "a" and "b" for x = 3, so I would like the smoothers for "a" and "b" to have a break for x = 3. What's the best strategy to accomplish that?
Upvotes: 1
Views: 1032
Reputation: 174853
I would create an expansion of the combinations of x
and factor
and then do a database-like join on the combinations and the data. For example, first I form a new data frame df
with the combinations of the unique values of x
and factor
df <- expand.grid(sapply(data[, c("x", "factor")], unique))
> df
x factor
1 1 a
2 2 a
3 3 a
4 4 a
5 5 a
6 1 b
7 2 b
8 3 b
9 4 b
10 5 b
11 1 c
12 2 c
13 3 c
14 4 c
15 5 c
Then we can simply perform a join operation on the df
and your data
, requesting that we return all the rows from the left hand side (the x
argument, hence df
), and include corresponding values for y
from the right hand side (data
). Where there is no corresponding right hand side (in data
, we will get an NA
.
newdf <- merge(df, data, all.x = TRUE)
> newdf
x factor y
1 1 a -0.50219235
2 1 b 0.13153117
3 1 c -0.07891709
4 2 a 0.88678481
5 2 b 0.11697127
6 2 c 0.31863009
7 3 a NA
8 3 b NA
9 3 c -0.58179068
10 4 a 0.71453271
11 4 b -0.82525943
12 4 c -0.35986213
13 5 a 0.08988614
14 5 b 0.09627446
15 5 c -0.20163395
Now we can fit and predict from a loess model by hand, but this is a little tedious - easier options are available via mgcv:gam()
loessFun <- function(XX, span = 0.85) {
fit <- loess(y ~ x, data = XX, na.action = na.exclude, span = span)
predict(fit)
}
Now split the data by factor
and apply the loessFun()
wrapper
fits <- lapply(split(newdf, newdf$factor), loessFun)
newdf <- transform(newdf, fitted = unsplit(fits, factor))
> head(newdf)
x factor y fitted
1 1 a -0.50219235 -0.50219235
2 1 b 0.13153117 0.13153117
3 1 c -0.07891709 -0.07891709
4 2 a 0.88678481 0.88678481
5 2 b 0.11697127 0.11697127
6 2 c 0.31863009 0.31863009
We can then plot using the new data frame
ggplot(newdf, aes(x = x, y = y, col = factor)) +
geom_line(aes(group = factor))
which gives:
It looks a bit funky because of the very low resolution of the sample data you provided and because this method that I've used predicts at the observed data only, preserving NA
s. geom_smooth()
is actually predicting over the range of x
for each group separately and as such there are no missing x
s in the data used to draw the geom layer.
Unless you can explain within what region of x = 3
we should add a break (an NA
), this may well be the best that you can do. Alternatively, we could predict over the region from the models and then set anything 2.5 < x < 3.5
back to being NA
. Add a comment if that is what you wanted and I'll expand my answer with an example of doing that if you can indicate how we are to envisage the gaps.
Upvotes: 2