Guy Sutton
Guy Sutton

Reputation: 45

Issue with nls model specification in R (singular gradient error)

I am trying to fit a non-linear model using nls in R, however, I can't seem to get the model specification correct. Any advice on how to correct the model, would be much appreciated. A minimal example below.

# Import dummy data 
data <- read.csv("https://raw.githubusercontent.com/guysutton/nls_singular_repo/main/data2.csv")
head(data)

I think the specification of the fixed effects is what is causing the issue. I am not experienced with fitting 'nls' and picking starting values, as several other StackOverflow posts have indicated that the error I am getting could be because of poor starting values. I have tried starting pH(1:3, in increments of 0.1).

# Fit NLS model 
mod <- stats::nls(
  # Response variable
  y ~
    # Predictors 
     ( (x1 + x2) / (-0.6258*pH^4+11.847*pH^3-79.884*pH^2+230.97*(pH)-242.64) ),
  start = list(pH = 1),
  control = nls.control(maxiter = 1000),
  data = data
)

Error in nls(y ~ ((x1 + x2)/(-0.6258 * pH^4 + 11.847 * pH^3 - 79.884 * : singular gradient

However, when I take the fixed effect expression and manually input values for the first row of data, I get the correct value.

# This produces the correct value for the first row of data when I sub in the values 
(x1 + x2) / (-0.6258*pH^4+11.847*pH^3-79.884*pH^2+230.97*(pH)-242.64)
(1.50 + 0.45) / (-0.6258*4.41^4+11.847*4.41^3-79.884*4.41^2+230.97*(4.41)-242.64)

[1] 1.132759

Any advice on what might be causing my issue, and how to potentially solve this issue, would be much appreciated.

Upvotes: 0

Views: 80

Answers (1)

AndS.
AndS.

Reputation: 8110

The problem (as pointed out in the comments) is that you are trying to determine the coefficients for your model with nls. The way you currently have the model set up, you are trying to predict a single pH value that describes all the variability in y given x1 and x2. I think what you really want to to determine the coefficients for your input pH values that give the best model. Maybe something like:

library(tidyverse)

mod <- stats::nls(
  # Response variable
  y ~
    # Predictors 
     ( (x1 + x2) /(k*pH + k2*pH^2 + k3*pH^3 + k4*pH^4) ),
  start = list(k = 10, k2 = 1, k3 = 0.3, k4 = 0.03),
  control = nls.control(maxiter = 1000, tol = 1e-3, minFactor = 1e-5),
  data = data
)

summary(mod)
#> 
#> Formula: y ~ ((x1 + x2)/(k * pH + k2 * pH^2 + k3 * pH^3 + k4 * pH^4))
#> 
#> Parameters:
#>      Estimate Std. Error t value Pr(>|t|)    
#> k  -113.65175    4.31320  -26.35   <2e-16 ***
#> k2   82.57272    3.20163   25.79   <2e-16 ***
#> k3  -19.84041    0.78430  -25.30   <2e-16 ***
#> k4    1.58837    0.06349   25.02   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.6587 on 5076 degrees of freedom
#> 
#> Number of iterations to convergence: 13 
#> Achieved convergence tolerance: 0.0007377

data|>
  mutate(mod = predict(mod)) |>
  ggplot(aes(y, mod))+
  geom_point()+
  scale_x_log10()+
  scale_y_log10()

Upvotes: 1

Related Questions