HelmiYeet
HelmiYeet

Reputation: 17

How to fit logarithmic regression to a "negative exponential" scatterplot in R

I have a scatterplot of Daily Rainfall (x) and Observations (y), which looks like the right/positive x-value half of a x^-2 plot or a logarithmic plot to the base of 1/2. Basically y values are really high when x values are really low. The bigger the x values get, the lower y's become. But the rate at which y values decrease becomes slower and y's are never negative.

Here is a representative sample:

 rain <- c(1, 1.2, 1.3, 2.5, 3.2, 4.2, 5, 7, 7.5, 10.3, 11.7, 12.9, 14.1, 15, 15.5, 17.5, 18.3, 20, 20.2, 20.3, 25, 28, 30, 34, 40)

 obs <- c(42, 44, 43.9, 43.5, 35, 22, 18.4, 15.3, 10, 6.2, 5.7, 4, 3.7, 2.3, 2, 2.7, 3.5, 3, 2.9, 4, 1.6, 2.2, 1.6, 1.3, 0.8)

Now I want to fit a regression model to this scatterplot. I already tried polynomial regression up until x^-4, but I want to try a logarithmic regression as well, because I think it might turn out to be a higher quality model.

Here is what I did so far with polynomial models:

    y <- data$obs
    x <- data$rain
    xsq <- x^-2
    xcub <- x^-3
    xquar <- x^-4
    

    fit4 <- lm(y~x+xsq+xcub+xquar) # I did the same for fit 1-3; until fit 4 it becomes more significant
    xv <- seq(min(x), max(x), 0.01)
    yv <- predict(fit5, list(x=xv, xsq=xv^-2, xcub=xv^-3, xquar=xv^-4))
    lines(xv, yv)

And this is what I tried for logarithmic models, but it just returns straight lines that don't match the curve. I feel like log() is not the function I really need.

xlog <- log(x)
fitlogx <- lm(y~xlog)
xv <- seq(min(xlog), max(xlog), 0.01)
yv <- predict(fitlogx, list(x=xv))
abline(fitlogx)

ylog <- log(y)
fitlogy <- lm(ylog~x)
xv <- seq(min(x), max(x), 0.01)
yv <- predict(fitlogy, list(x=xv))
abline(fitlogy)

Now I would like to know how I can fit a logarithmic function that makes sense. I'm also thankful for any advice if you know another type of regression model that could be useful.

Upvotes: 0

Views: 1172

Answers (1)

dcarlson
dcarlson

Reputation: 11056

Your obs variable fits pretty well to the inverse of rain. For example

dev.new(width=12, height=6)
oldp <- par(mfrow=c(1, 2))
plot(obs~rain)
lines(rain, 1/rain*40)

The curve needs to be a bit higher. We could guess repeatedly, e.g. try rain*60, but it is easier to use the nls function to get the best least squares fit to the equation:

obs.nls <- nls(obs~1/rain*k, start=list(k=40))
summary(obs.nls)
# 
# Formula: obs ~ 1/rain * k
# 
# Parameters:
#   Estimate Std. Error t value Pr(>|t|)    
# k   57.145      4.182   13.66 8.12e-13 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 6.915 on 24 degrees of freedom
# 
# Number of iterations to convergence: 1 
# Achieved convergence tolerance: 2.379e-09
plot(obs~rain)
pred <- predict(obs.nls)
points(rain, pred, col="red", pch=18)
pred.rain <- seq(1, 40, length.out=100)
pred.obs <- predict(obs.nls, list(rain=pred.rain))
lines(pred.rain, pred.obs, col="blue", lty=2)

So the best estimate for k is 57.145. The main drawback to nls is that you must provide starting values for the coefficient(s). Also it can fail to converge, but for the simple function we are using here, it works fine as long as you can estimate reasonable starting values.

Plots

If rain can have zero values, you can add an intercept:

obs.nls <- nls(obs ~ k / (a + rain), start=list(a=1, k=40))
summary(obs.nls)
# 
# Formula: obs ~ k/(a + rain)
# 
# Parameters:
#   Estimate Std. Error t value Pr(>|t|)    
# a   1.4169     0.4245   3.337  0.00286 ** 
# k 117.5345    16.6878   7.043 3.55e-07 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 4.638 on 23 degrees of freedom

Number of iterations to convergence: 10 
Achieved convergence tolerance: 6.763e-06

Notice that the standard error is smaller, but the curve overestimates the actual values for rain > 10.

Upvotes: 2

Related Questions