clover2004
clover2004

Reputation: 1

how to add regression lines for each factor on a plot

I've created a model and I'm trying to add curves that fit the two parts of the data, insulation and no insulation. I was thinking about using the insulation coefficient as a true/false term, but I'm not sure how to translate that into code. Entries 1:56 are "w/o" and 57:101 are "w/". I'm not sure how to include the data I'm using but here's the head and tail:

  month year  kwh days est   cost avgT dT.yr   kWhd.1 id insulation
1     8 2003  476   21   a  33.32   69    -8 22.66667  1        w/o
2     9 2003 1052   30   e 112.33   73    -1 35.05172  2        w/o
3    10 2003  981   28   a  24.98   60    -6 35.05172  3        w/o
4    11 2003 1094   32   a  73.51   53     2 34.18750  4        w/o
5    12 2003 1409   32   a  93.23   44     6 44.03125  5        w/o
6     1 2004 1083   32   a  72.84   34     3 33.84375  6        w/o

    month year kwh days est  cost avgT dT.yr   kWhd.1  id insulation
96      7 2011 551   29   e 55.56   72     0 19.00000  96         w/
97      8 2011 552   27   a 61.17   78     1 20.44444  97         w/
98      9 2011 666   34   e 73.87   71    -2 19.58824  98         w/
99     10 2011 416   27   a 48.03   64     0 15.40741  99         w/
100    11 2011 653   31   e 72.80   53     1 21.06452 100         w/
101    12 2011 751   33   a 83.94   45     2 22.75758 101         w/
bill$id <- seq(1:101)
bill$insulation <- as.factor(ifelse(bill$id > 56, c("w/"), c("w/o")))

m1 <- lm(kWhd.1 ~ avgT + insulation + I(avgT^2), data=bill)

with(bill, plot(kWhd.1 ~ avgT, xlab="Average Temperature (F)", 
                ylab="Daily Energy Use (kWh/d)", col=insulation))

no_ins <- data.frame(bill$avgT[1:56], bill$insulation[1:56])
curve(predict(m1, no_ins=x), add=TRUE, col="red")

ins <- data.frame(bill$avgT[57:101], bill$insulation[57:101])
curve(predict(m1, ins=x), add=TRUE, lty=2)

legend("topright", inset=0.01, pch=21, col=c("red", "black"), 
       legend=c("No Insulation", "Insulation"))

Upvotes: 0

Views: 64

Answers (2)

jay.sf
jay.sf

Reputation: 73802

In base R it's important to order the x-values. Since this is to be done on multiple factors, we can do this with by, resulting in a list L.

Since your example data is not complete, here's an example with iris where we consider Species as the "factor".

L <- by(iris, iris$Species, function(x) x[order(x$Petal.Length), ])

Now we can do the plot and add loess predictions as lines with a sapply.

with(iris, plot(Sepal.Width ~ Petal.Length, col=Species))
sapply(seq(L), function(x) 
  lines(L[[x]]$Petal.Length, 
        predict(loess(Sepal.Width ~ Petal.Length, L[[x]], span=1.1)),  # span=1.1 for smoothing
        col=x))

Yields

enter image description here

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 146224

ggplot2 makes this a lot easier than base plotting. Something like this should work:

ggplot(bill, aes(x = avgT, y = kWhd.1, color = insulation)) +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = FALSE) +
  geom_point()

In base, I'd create a data frame with point you want to predict on, something like

pred_data = expand.grid(
  kWhd.1 = seq(min(bill$kWhd.1), max(bill$kWhd.1), length.out = 100),
  insulation = c("w/", "w/o")
)
pred_data$prediction = predict(m1, newdata = pred_data)

And then use lines to add the predictions to your plot. My base graphics is pretty rusty, so I'll leave that to you (or another answerer) if you want it.

Upvotes: 1

Related Questions