Reputation: 185
Using the mtcars
data set, I am trying to determine the broken line regression fit of mpg
as a function of hp
and wt
, with breakpoints coming only from hp
. Here is the code:
mpg = mtcars$mpg
wt = mtcars$wt
hp = mtcars$hp
reg = lm (mpg ~ hp +wt)
hp_thresh = 150
wt_thresh = 3
library(segmented)
seg.o = segmented (reg,seg.Z=~hp+wt, psi=list (hp=hp_thresh, wt=wt_thresh))
fted = broken.line (seg.o, term="hp") # mpg fit for hp breakpoints
fted$fit
values do not make sense (compare them to mpg values). I guess there is something simple I'm missing in the usage of the function(s).
EDIT:
To clarify further: I am looking to use only the segmented relationship between mpg
and hp
still including wt
in a linear fashion. How can one do that?
I tried using hp_thresh
in the input for segmented()
so as to have the broken relationship only for hp
but I get this error: "A wrong number of terms in seg.Z
or psi
".
Upvotes: 2
Views: 899
Reputation: 15907
You are doing things correctly, but you misinterpret the meaning of the output of broken.lines()
(in your notation, fted$fit
). These values should not be expected to agree with mpg
.
You can keep the linear relationship for some of the dependent variables by supplying to seg.Z
only those variables, where you want to have a broken line.
You are creating a model for mpg
as a function of two variables, hp
and wt
. The model is piecewise linear for in both these variables. It is important to note that in order to predict a value for mpg
, you need to specify both, hp
and wt
.
The function broken.line()
is used to calculate predictions for mpg
if one of the variables, specified in the argument term
is varied. You do that for the variable hp
as follows:
fted = broken.line (seg.o, term="hp")
But remember: in order to predict, you also need to specify a value for wt
. The values you get in fted$fit
are actually calculated with wt
kept at zero.
You can check this by using predict()
, which is the function to be used, when you want to obtain predictions from your model:
pred <- predict(seg.o, data.frame(hp = hp, wt = 0))
pred - fted$fit
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
If you want to compare the model predictions with mpg
, you have to use the same values for both variables, hp
and wt
, as for the fit. You can also do this with predict:
pred <- predict(seg.o, data.frame(hp = hp, wt = wt))
plot(hp, mpg)
points(hp, pred, col = "red")
As you can see, the values produced in this way agree more or less with mpg
. It is in this way that you should check the quality of your model, and not by using broken.line()
.
You can also look at the piecewise linear relationship between mpg
and one of the variables. You have calculated these values already in fted
, but you could also directly plot it from seg.o
as follows:
plot(seg.o, term = "hp")
points(hp, fted$fit, col = "red")
Note that also for plot()
, you need to specify which variable to vary (here hp
). The other is again kept at zero. I have also added the points from fted$fit
to show you that the plot is actually equivalent to what is calculated by broken.line
.
If you want to fit a broken line in only one of the dependent variables and keep the linear relationship for the other(s), you need to only specify those variables in seg.Z
that you want the broken line for. You asked to have the broken line for hp
only, which can be done as follows:
seg.o2 <- segmented(reg, seg.Z = ~hp, psi = list(hp = hp_thresh))
The show that this worked, I predict the model with only one of the variables varied and plot.
Varying hp
, keeping wt
fixed`:
pred_hp <- predict(seg.o2, data.frame(hp = 50:300, wt = 0))
plot(50:300, pred_hp, type = "l")
Varying wt
, keeping hp
fixed`:
pred_wt <- predict(seg.o2, data.frame(wt = seq(1.5, 5.5, by = 0.1), hp = 0))
plot(seq(1.5, 5.5, by = 0.1), pred_wt, type = "l")
As you can see, your model for mpg
depends linearly on wt
, but has a broken linear relationship with hp
.
Upvotes: 2