Rizi
Rizi

Reputation: 11

Run a linear regression that uses one column to predict the log of another column?

I have this data:

        A     B
  1 632364    4
  2 144599    2
  3 3715821   2
  4 184524    5
  5 1674      11
  6 0         4
  7 8019      7
  8 25992     6
  9 0         16
 10 0         15
 11 19172040  2

I am trying run a linear regression on B to predict the log of A.

I tried doing this:

B.lm <- lm(B ~ A) log.A <- as.data.frame(log(A)) predict(B.lm,log.A, interval="predict")

what it gives me doesn't seem right.

Any ideas? Also not sure how to handle the log of zero

Upvotes: 1

Views: 278

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269654

Exclude the 0 values so that log is meaningful. The limitation is that the resulting model will never be able to predict a zero. If the zeros really represent missing values and the missing values are missing at random then this may not matter. The input dd is shown reproducibly in the Note at the end. The code below fits the model and then plots the points for which A > 0 and plots a line for the fitted (i.e. predicted) values.

ddpos <- subset(dd, A > 0)
fm <- lm(log(A) ~ B, ddpos)
plot(log(A) ~ B, ddpos)
abline(fm)

The last line could alternately be written as:

lines(fitted(fm) ~ B, ddpos)

In either case we get this figure:

screenshot

Note: We used this as the input:

dd <- structure(list(A = c(632364L, 144599L, 3715821L, 184524L, 1674L, 
0L, 8019L, 25992L, 0L, 0L, 19172040L), B = c(4L, 2L, 2L, 5L, 
11L, 4L, 7L, 6L, 16L, 15L, 2L)), .Names = c("A", "B"), 
class = "data.frame", row.names = 
c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))

Upvotes: 1

Related Questions