hemicellulose
hemicellulose

Reputation:

Using nls and ggplot2 to fit a logarithmic curve to data

I am using R to fit data on a logarithmic curve with equation:

y = a * log(b * x)

My data looks like this:

#Creating example data
pre <- c(946116, 1243227, 1259646, 1434124, 1575268, 2192526, 3252832, 6076519)  
post <- c(907355, 1553586, 1684253, 2592938, 1919173, 1702644,3173743, 3654198)  
data <- data.frame(pre,post)

#Plotting data
  ggplot(data, aes(x=pre, y=post))+
  geom_point()

Example plot

But when I try to fit a logarithmic curve using geom_smooth, I get an error.

# Fitting logarithmic curve
ggplot(data, aes(x=pre, y=post))+
  geom_point()+
  geom_smooth(method="nls", se=FALSE,
              method.args=list(formula=y~a*log(b*x),
                               start=c(a=100, b=2)))

Warning messages:

1: In log(b * x) : NaNs produced
2: Computation failed in `stat_smooth()`:
Missing value or an infinity produced when evaluating the model 

I get similar issues when I try to create a logarithmic model in nls, without using ggplot

model <- nls(data=data, 
             formula=y~a*log(b*x),
             start=list(a=100, b=2))

Warning messages:

Error in numericDeriv(form[[3L]], names(ind), env) : 
  Missing value or an infinity produced when evaluating the model
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In log(b * x) : NaNs produced

As someone who is new to R, I don't quite understand what the error messages are trying to tell me. I know that I need to change how I am specifying start conditions, but I don't know how.

Upvotes: 1

Views: 4916

Answers (2)

Sal Mangiafico
Sal Mangiafico

Reputation: 510

I see a couple of problems in your nls call. 1) You're using the variables x and y, when these variables don't exist. They should be pre and post. 2) The size of numbers is giving nls trouble. It helps if you divide them by 1,000,000.

pre <- c(946116, 1243227, 1259646, 1434124, 1575268, 2192526, 3252832, 6076519)  
post <- c(907355, 1553586, 1684253, 2592938, 1919173, 1702644,3173743, 3654198)

pre = pre/1000000
post = post/1000000

data <- data.frame(pre,post)

model <- nls(data=data, 
             formula=post~a*log(b*pre),
             start=list(a=1, b=1))

summary(model)

But as shown in the previous answer, changing the form of the equation will help without needing to change the scale of the data.

pre <- c(946116, 1243227, 1259646, 1434124, 1575268, 2192526, 3252832, 6076519)  
post <- c(907355, 1553586, 1684253, 2592938, 1919173, 1702644,3173743, 3654198)

data <- data.frame(pre,post)

model <- nls(data=data, 
             formula=post~a*log(pre)+b,
             start=list(a=1, b=0))

summary(model)

Upvotes: 0

Dan
Dan

Reputation: 12074

Try this:

ggplot(data, aes(x=pre, y=post))+
  geom_point()+
  geom_smooth(method="nls", se=FALSE, formula=y~a*log(x)+k,
              method.args=list(start=c(a=1, k=1)))

enter image description here

Notice that it's essentially the same formula, but now k = a * log(b):

a * log(b * x) = a * {log(b) + log(x)} = a * log(x) + a * log(b) = a * log(x) + k

Upvotes: 2

Related Questions