Reputation: 109
I know python and C++ but have very little experience with R. I'm supposed to figure out what my old coworker's script does - he hasn't been here for several years but I have his files. He has about 10 python files that pass data into a temp file and then into the next python script, which I'm able to track, but he has one R script that I don't understand because I don't know R.
The input to the R script is temp4.txt:
1.414442 0.0043
1.526109 0.0042
1.600553 0.0046
1.637775 0.0045
...etc
Where column 1 is the x-axis of a growth curve (time units) and column 2 is growth level (units OD600, which is a measure of cell density).
The R script is only 4 lines:
inp1 <- scan('/temp4.txt', list(0,0))
decay <- data.frame(t = inp1[[1]], amp = inp1[[2]])
form <- nls(amp ~ const*(exp(fact*t)), data=decay, start = list(const = 0.01, fact = 0.5))
summary(form)
The R script's output:
Parameters:
Estimate Std. Error t value Pr(>|t|)
const 2.293e-03 9.658e-05 23.74 <2e-16 ***
fact 7.106e-01 8.757e-03 81.14 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.002776 on 104 degrees of freedom
Correlation of Parameter Estimates:
const
fact -0.9905
Where the "fact" number is what he's pulling out in the next python script as the value to continue forward with analysis. Usually it's a positive value e.g. "6.649e-01 6.784e-01 6.936e-01 6.578e-01 6.949e-01 6.546e-01 0.6623768 0.6710339 6.952e-01 6.711e-01 6.721e-01 6.520e-01" but because temp file gets overwritten each time I only have one version of it with the negative value -0.9905 which he's throwing away negative values in the next python script.
I need to know what exactly he's doing to recreate it... I know the <- is passing data into an object, so it's the nls() function that's confusing me...
Thanks anyone who can explain the R for me.
Upvotes: 0
Views: 217
Reputation: 151
GSee's link in the comments is a good description of the NLS function, but in case you're not used to R documentation, here's a quick rundown of nls.
NLS function is nonlinear least-squares modelling function. It is similar to a linear regression, except that it is believed that at least one of the parameters is not linear (sine function, cosine function, x^2 function, etc.). Non-linear parameters can sometimes be transformed into a linear parameter (i.e. by doing a logarithm conversion or some such) but that's not always the case.
The first option is the model that is being tested: amp ~ const*(exp(fact*t)), means that we would like to model amp as the dependent variable and would like e^(fact*t) to be our independent (non-linear) variable.
The next option just tells us which data object to use (data = decay).
The start argument tells us the starting values to build the model off of, in this case const = .01 and fact = .5.
So the first command reads in the data to the object inp1. The second creates an object that has class data.frame (which is what most analysis in R is done on). This is basically a table with two columns. In this case the columns are given names (t and amp). The third command creates an object that has class nls. This object basically contains the information that the nls command generates. The fourth command prints out a summary of the nls-class object - basically all the pertinent details of the analysis.
The output reads as follows:
First the estimates and std. deviation for the two parameters, const and fact, in the nonlinear model. The t-value and P columns show you a statistical calculation of whether the parameters are statistically significantly different from 0.
Signif. codes is a legend for the stars showing to the right of the parameter estimates - what the p value is.
Residual standard error is an indicator of how much of the std. error is not explained by the model.
This last one I'm not sure about, as I haven't used NLS in a while, but I think it's right. Correlation of estimates shows how strongly correlated the parameters are. In this case, a -.9905 value is a very strong negative correlation - as fact goes up, const goes down and it's very predictable.
Upvotes: 1
Reputation: 49660
The first line reads the data into R
The second line restructures the data into a data frame (a table structure that is used commonly in R, it will be passed as the data to nls
in line 3).
This looks like older code, most modern coders would replace lines 1 and 2 with one call to read.table
.
Line 3 fits a non-linear least squares equation to the data read in previously and line 4 prints the summary of the fit including the estimates of the parameters for the next python script to read.
The non-linear model that is being fit is an exponential growth curve and the fact parameter is a measure of the rate of growth.
Upvotes: 1