Reputation: 11
I have this data:
A B
1 632364 4
2 144599 2
3 3715821 2
4 184524 5
5 1674 11
6 0 4
7 8019 7
8 25992 6
9 0 16
10 0 15
11 19172040 2
I am trying run a linear regression on B to predict the log of A.
I tried doing this:
B.lm <- lm(B ~ A)
log.A <- as.data.frame(log(A))
predict(B.lm,log.A, interval="predict")
what it gives me doesn't seem right.
Any ideas? Also not sure how to handle the log of zero
Upvotes: 1
Views: 278
Reputation: 269654
Exclude the 0 values so that log is meaningful. The limitation is that the resulting model will never be able to predict a zero. If the zeros really represent missing values and the missing values are missing at random then this may not matter. The input dd
is shown reproducibly in the Note at the end. The code below fits the model and then plots the points for which A > 0 and plots a line for the fitted (i.e. predicted) values.
ddpos <- subset(dd, A > 0)
fm <- lm(log(A) ~ B, ddpos)
plot(log(A) ~ B, ddpos)
abline(fm)
The last line could alternately be written as:
lines(fitted(fm) ~ B, ddpos)
In either case we get this figure:
Note: We used this as the input:
dd <- structure(list(A = c(632364L, 144599L, 3715821L, 184524L, 1674L,
0L, 8019L, 25992L, 0L, 0L, 19172040L), B = c(4L, 2L, 2L, 5L,
11L, 4L, 7L, 6L, 16L, 15L, 2L)), .Names = c("A", "B"),
class = "data.frame", row.names =
c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
Upvotes: 1