Reputation: 697
I am using the bs function of the splines package to create a b-spline smoothing curve for graphical purposes. (There is at least one report that Excel uses a third order b-spline for its smooth line graphs, and I would like to be able to duplicate those curves.) I am having trouble understanding the arguments required by the bs function. Representative code follows below, as adapted from the bs documentation:
require(splines)
require(ggplot2)
n <- 10
x <- 1:10
y <- rnorm(n)
d <- data.frame(x=x, y=y)
summary(fm1 <- lm(y ~ bs(x, degree=3)), data=d)
x.spline <- seq(1, 10, length.out=n*10)
spline.data <- data.frame(x=x.spline, y=predict(fm1, data.frame(x=x.spline)))
ggplot(d, aes(x,y)) + geom_point + geom_line(aes(x,y), data=spline.data)
The example code in the bs documentation specifies df=5 in the call to bs, and does not specify degree. I have no idea how many degrees of freedom I have. All I know is that I want a third order b-spline. I have experimented with specifying different values of df instead of, or in addition to degree, and I get dramatically different results. This is why I suspect that a specification of df is the issue here. How would I calculate df in this context?
The help file suggests df = length(knots) + degree. If I treat the interior points as knots, this gives me df=11 for this example, which generates error messages and a nonsensical spline fit.
Thank you in advance.
I was apparently not clear in my intentions. I am trying to do this: How can I use spline() with ggplot?, but with b-splines.
Upvotes: 1
Views: 4748
Reputation: 697
I did a little more work and came up with the following. First, an apology. What I was looking for was a smoothing spline, rather than a regression spline. I did not have the vocabulary to phrase the question properly. While the example in the help file for bs() appears to provide this, the function does not provide the same behavior for my sample data. There is another function, smooth.spline, in the stats package, which offers what I needed.
set.seed(tunc(100000*pi))
n <- 10
x <- 1:n
xx <- seq(1, n, length.out=200)
y <- rnorm(n)
d <- data.frame(x=x, y=y)
spl <- smooth.spline(x,y, spar=0.1)
spline.data <- data.frame(y=predict(spl,xx))
ggplot(d,aes(x,y)) + geom_point() + geom_line(aes(x,y), spline.data)
spl2 <- smooth.spline(x, y, control=
list(trace=TRUE, tol=1e-6, spar=0.1, low=-1.5, high=0.3))
spline.data2 <- data.frame(predit(spl2,xx))
ggplot(d,aes(x,y)) + geom_point() + geom_line(aes(x,y), spline.data2)
The two calls to smooth.spline represent two approaches. The first specifies the smoothing parameter manually, and the second iterates to an optimal solution. I found that I had to constrain the optimization properly to get the type of solution I was after.
The result is intended to match the b-spline used by the Excel line plot. I have collaborators who consider Excel graphics to be the standard, and I need to at least match that performance.
Upvotes: 0
Reputation: 263321
You should not be trying to fit every point. The goal is to find a summary that is an acceptable fit but which depends on a limited number of knots. There is not much value in increasing hte degree of the polynomial above the default of three. With only 10 points you surely do not want df=11. Try df=5 and the results should be reasonably flat. The rms/Hnisc package author, Frank Harrell, prefers restricted cubic splines because the predictions at the extremes are linear and thus less wild than would occur with other polynomial bases.
I corrected a couple of misspellings and added a knots
argument to make your code work:
require(splines)
require(ggplot2); set.seed(trunc(100000*pi))
n <- 10
x <- 1:10
y <- rnorm(n)
d <- data.frame(x=x, y=y)
summary(fm1 <- lm(y ~ bs(x, degree=3, knots=2)), data=d)
x.spline <- seq(1, 10, length.out=n*10)
spline.data <- data.frame(x=x.spline, y=predict(fm1, data.frame(x=x.spline)))
ggplot(d, aes(x,y)) + geom_point() + geom_line(aes(x,y), data=spline.data)
I came away from the exercise of varying the randomseed with the opinion that Frank Harrell knows what he is talking about. I don't get the same sort of behavior at the extremes when using his packages.
Upvotes: 3