rocketman
rocketman

Reputation: 49

R: Loop structure to use dynamically sized arrays to build linear models

With every iteration of the loop, I'd like to fit a linear model using more historical data and see how, for example, the one-step ahead prediction compares to the actual. The code should be self-explanatory. The problem seems to be that Dependent and Independent are fixed in size after the first iteration (which I'd like to start at 10 data points, as shown in the code), whereas I'd like them to be dynamically sized.

output1 <- rep(0, 127)
output2 <- rep(0, 127)
ret <- function(x, y)
{
  for (i in 1:127)
  {
    Dependent <- y[1:(9+i)]
    Independent <- x[1:(9+i)]
    fit <- lm(Dependent ~ Independent)
    nextInput <- data.frame(Independent = x[(10+i)])
    prediction <- predict(fit, nextInput, interval="prediction")
    output1[i] <- prediction[2]
    output2[i] <- prediction[3]
  }
}

Upvotes: 0

Views: 42

Answers (1)

r2evans
r2evans

Reputation: 160607

Here's a thought, let me know if I'm close to your intent:

set.seed(42)
n <- 100
x <- rnorm(n)
head(x)
# [1]  1.3709584 -0.5646982  0.3631284  0.6328626  0.4042683 -0.1061245
y <- runif(n)
head(y)
# [1] 0.8851177 0.5171111 0.8519310 0.4427963 0.1578801 0.4423246

ret <- lapply(10:n, function(i) {
  dep <- y[1:i]
  indep <- x[1:i]
  fit <- lm(dep ~ indep)
  pred <- 
    if (i < n) {
      predict(fit, data.frame(indep = x[i+1L]), interval = "prediction")
    } else NULL
  list(fit = fit, pred = pred)
})

Note that I'm making a list of models/predictions instead of using a for loop. Though not exactly the same, this answer does a decent job explaining why this may be a good idea.

Model and prediction from one of the runs:

ret[[50]]
# $fit
# Call:
# lm(formula = dep ~ indep)
# Coefficients:
# (Intercept)        indep  
#     0.44522      0.02691  
# $pred
#         fit        lwr      upr
# 1 0.4528911 -0.1160787 1.021861
summary(ret[[50]]$fit)
# Call:
# lm(formula = dep ~ indep)
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -0.42619 -0.22178 -0.00004  0.15550  0.53774 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.44522    0.03667  12.141   <2e-16 ***
# indep        0.02691    0.03186   0.845    0.402    
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 0.2816 on 57 degrees of freedom
# Multiple R-squared:  0.01236, Adjusted R-squared:  -0.004966 
# F-statistic: 0.7134 on 1 and 57 DF,  p-value: 0.4018

Upvotes: 1

Related Questions