Greg Ferraro
Greg Ferraro

Reputation: 11

How do I add new columns to a data set for each regression loop iteration?

I'm trying to test the predictive power of a model by breaking the observations into 1/4th and 3/4th groups (test and train respectively), running a first-order regression with the independent variable train sample, using these coefficients to produce predicted values from the independent variable test sample, and then I would like to add new columns of these predicted values to the dependent variable test data for each iteration of the loop.

For context: TSIP500 is the full sample; iv is independent variable; dv is dependent variable, a max of 50 iterations is simply a test that isn't too large in quantity of iterations.

I was having trouble with the predict function so I did the equation manually. My code is below:

for(i in 1:50){
  test_index <- sample(nrow(TSIP500iv), (1/4)*nrow(TSIP500iv), replace=FALSE)
  train_500iv <- TSIP500[-test_index,"distance"]
  test_500iv <- TSIP500[test_index,"distance"]
  train_500dv <- TSIP500[-test_index,"percent_of_max"]
  test_500dv <- TSIP500[test_index,"percent_of_max"]
  reg_model <- lm(train_500dv~train_500iv)
  int <- reg_model$coeff[1]
  B1 <- reg_model$coeff[2]
  predicted <- (int + B1*test_500iv)
  predicted <- data.frame(predicted)
  test_500dv <- data.frame(test_500dv)
  test_500dv[,i] <- apply(predicted)
}

I've tried different approaches for the last line, but I always just get a singular column added. Any help would be tremendously appreciated.

Upvotes: 1

Views: 53

Answers (1)

mr_swap
mr_swap

Reputation: 341

for(i in 1:50){
  test_index <- sample(nrow(TSIP500iv), (1/4)*nrow(TSIP500iv), replace=FALSE)
  train_500iv <- TSIP500[-test_index,"distance"]
  test_500iv <- TSIP500[test_index,"distance"]
  train_500dv <- TSIP500[-test_index,"percent_of_max"]
  test_500dv <- TSIP500[test_index,"percent_of_max"]
  reg_model <- lm(train_500dv~train_500iv)
  int <- reg_model$coeff[1]
  B1 <- reg_model$coeff[2]
  temp_results <- paste('pred',i,sep='_')
  assign(temp_results, as.data.frame(int + B1*test_500iv))
  test_500dv <- cbind(data.frame(test_500dv),temp_results)
}

Upvotes: 0

Related Questions