How to fit with a broken line in only one of two dependent variables?

Question

Using the mtcars data set, I am trying to determine the broken line regression fit of mpg as a function of hp and wt, with breakpoints coming only from hp. Here is the code:

mpg = mtcars$mpg
wt = mtcars$wt
hp = mtcars$hp

reg = lm (mpg ~ hp +wt)

hp_thresh = 150
wt_thresh = 3

library(segmented)
seg.o = segmented (reg,seg.Z=~hp+wt, psi=list (hp=hp_thresh, wt=wt_thresh))
fted = broken.line (seg.o, term="hp") # mpg fit for hp breakpoints

fted$fit values do not make sense (compare them to mpg values). I guess there is something simple I'm missing in the usage of the function(s).

EDIT:

To clarify further: I am looking to use only the segmented relationship between mpg and hp still including wt in a linear fashion. How can one do that?

I tried using hp_thresh in the input for segmented() so as to have the broken relationship only for hp but I get this error: "A wrong number of terms in seg.Z or psi".

Stibu · Accepted Answer

Summary

You are doing things correctly, but you misinterpret the meaning of the output of broken.lines() (in your notation, fted$fit). These values should not be expected to agree with mpg.

You can keep the linear relationship for some of the dependent variables by supplying to seg.Z only those variables, where you want to have a broken line.

What broken.line() does

You are creating a model for mpg as a function of two variables, hp and wt. The model is piecewise linear for in both these variables. It is important to note that in order to predict a value for mpg, you need to specify both, hp and wt.

The function broken.line() is used to calculate predictions for mpg if one of the variables, specified in the argument term is varied. You do that for the variable hp as follows:

fted = broken.line (seg.o, term="hp")

But remember: in order to predict, you also need to specify a value for wt. The values you get in fted$fit are actually calculated with wt kept at zero.

You can check this by using predict(), which is the function to be used, when you want to obtain predictions from your model:

pred <- predict(seg.o, data.frame(hp = hp, wt = 0))
pred - fted$fit
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 
##  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Creating predictions from the model

If you want to compare the model predictions with mpg, you have to use the same values for both variables, hp and wt, as for the fit. You can also do this with predict:

pred <- predict(seg.o, data.frame(hp = hp, wt = wt))
plot(hp, mpg)
points(hp, pred, col = "red")

As you can see, the values produced in this way agree more or less with mpg. It is in this way that you should check the quality of your model, and not by using broken.line().

Plotting the piecewise linear relationship

You can also look at the piecewise linear relationship between mpg and one of the variables. You have calculated these values already in fted, but you could also directly plot it from seg.o as follows:

plot(seg.o, term = "hp")
points(hp, fted$fit, col = "red")

Note that also for plot(), you need to specify which variable to vary (here hp). The other is again kept at zero. I have also added the points from fted$fit to show you that the plot is actually equivalent to what is calculated by broken.line.

Fit a broken line for only one variable

If you want to fit a broken line in only one of the dependent variables and keep the linear relationship for the other(s), you need to only specify those variables in seg.Z that you want the broken line for. You asked to have the broken line for hp only, which can be done as follows:

seg.o2 <- segmented(reg, seg.Z = ~hp, psi = list(hp = hp_thresh))

The show that this worked, I predict the model with only one of the variables varied and plot.

Varying hp, keeping wt fixed`:

pred_hp <- predict(seg.o2, data.frame(hp = 50:300, wt = 0))
plot(50:300, pred_hp, type = "l")

Varying wt, keeping hp fixed`:

pred_wt <- predict(seg.o2, data.frame(wt = seq(1.5, 5.5, by = 0.1), hp = 0))
plot(seq(1.5, 5.5, by = 0.1), pred_wt, type = "l")

As you can see, your model for mpg depends linearly on wt, but has a broken linear relationship with hp.