Reputation: 17
I have a scatterplot of Daily Rainfall (x) and Observations (y), which looks like the right/positive x-value half of a x^-2 plot or a logarithmic plot to the base of 1/2. Basically y values are really high when x values are really low. The bigger the x values get, the lower y's become. But the rate at which y values decrease becomes slower and y's are never negative.
Here is a representative sample:
rain <- c(1, 1.2, 1.3, 2.5, 3.2, 4.2, 5, 7, 7.5, 10.3, 11.7, 12.9, 14.1, 15, 15.5, 17.5, 18.3, 20, 20.2, 20.3, 25, 28, 30, 34, 40)
obs <- c(42, 44, 43.9, 43.5, 35, 22, 18.4, 15.3, 10, 6.2, 5.7, 4, 3.7, 2.3, 2, 2.7, 3.5, 3, 2.9, 4, 1.6, 2.2, 1.6, 1.3, 0.8)
Now I want to fit a regression model to this scatterplot. I already tried polynomial regression up until x^-4, but I want to try a logarithmic regression as well, because I think it might turn out to be a higher quality model.
Here is what I did so far with polynomial models:
y <- data$obs
x <- data$rain
xsq <- x^-2
xcub <- x^-3
xquar <- x^-4
fit4 <- lm(y~x+xsq+xcub+xquar) # I did the same for fit 1-3; until fit 4 it becomes more significant
xv <- seq(min(x), max(x), 0.01)
yv <- predict(fit5, list(x=xv, xsq=xv^-2, xcub=xv^-3, xquar=xv^-4))
lines(xv, yv)
And this is what I tried for logarithmic models, but it just returns straight lines that don't match the curve. I feel like log() is not the function I really need.
xlog <- log(x)
fitlogx <- lm(y~xlog)
xv <- seq(min(xlog), max(xlog), 0.01)
yv <- predict(fitlogx, list(x=xv))
abline(fitlogx)
ylog <- log(y)
fitlogy <- lm(ylog~x)
xv <- seq(min(x), max(x), 0.01)
yv <- predict(fitlogy, list(x=xv))
abline(fitlogy)
Now I would like to know how I can fit a logarithmic function that makes sense. I'm also thankful for any advice if you know another type of regression model that could be useful.
Upvotes: 0
Views: 1172
Reputation: 11056
Your obs
variable fits pretty well to the inverse of rain
. For example
dev.new(width=12, height=6)
oldp <- par(mfrow=c(1, 2))
plot(obs~rain)
lines(rain, 1/rain*40)
The curve needs to be a bit higher. We could guess repeatedly, e.g. try rain*60, but it is easier to use the nls
function to get the best least squares fit to the equation:
obs.nls <- nls(obs~1/rain*k, start=list(k=40))
summary(obs.nls)
#
# Formula: obs ~ 1/rain * k
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# k 57.145 4.182 13.66 8.12e-13 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 6.915 on 24 degrees of freedom
#
# Number of iterations to convergence: 1
# Achieved convergence tolerance: 2.379e-09
plot(obs~rain)
pred <- predict(obs.nls)
points(rain, pred, col="red", pch=18)
pred.rain <- seq(1, 40, length.out=100)
pred.obs <- predict(obs.nls, list(rain=pred.rain))
lines(pred.rain, pred.obs, col="blue", lty=2)
So the best estimate for k
is 57.145. The main drawback to nls
is that you must provide starting values for the coefficient(s). Also it can fail to converge, but for the simple function we are using here, it works fine as long as you can estimate reasonable starting values.
If rain
can have zero values, you can add an intercept:
obs.nls <- nls(obs ~ k / (a + rain), start=list(a=1, k=40))
summary(obs.nls)
#
# Formula: obs ~ k/(a + rain)
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# a 1.4169 0.4245 3.337 0.00286 **
# k 117.5345 16.6878 7.043 3.55e-07 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 4.638 on 23 degrees of freedom
Number of iterations to convergence: 10
Achieved convergence tolerance: 6.763e-06
Notice that the standard error is smaller, but the curve overestimates the actual values for rain > 10
.
Upvotes: 2