Reputation: 29
I am trying to run this code from this post: looping with iterations over two lists of variables for a multiple regression in R with modified variable and data frame names, because it seems to do exactly what I want and uses a very similar dataset. However, it keeps giving me an error and I don't know why, so I would really appreciate if someone could help me to understand the error or the corresponding line of code so I could try to figure out what's wrong.
for(i in 1:n) {
vars = names(output)[names(output) %in% paste0(c(".PRE", ".POST"), i)]
models[[as.character(i)]] = lm(paste("growth_rate ~ ", paste(vars, collapse=" + ")),
data = output)
}
Error in parse(text = x, keep.source = FALSE) :
<text>:2:0: unexpected end of input
1: growth_rate ~
^
My dataset looks almost like the one given in the above mentioned post besides the fact that my "RDPI_T" and "DRY_T" variables are in an alternating order (which I dont think matters in this case). The analogous variables I have are 69 PRE variables called id1.PRE, id2.PRE ... id69.PRE and also 69 POST variables called id1.POST, id2.POST ... id69.POST in the output dataset. Also, growth_rate is in the same dataset called output.
Additionally, I would also like to add 2 more independent variables that are regular and do not come from a list: country and year but I am unsure how to incorporate that here?
Any help would be appreciated. Thank you!
Upvotes: 1
Views: 173
Reputation: 46978
If your columns are called id1.PRE, id2.PRE, then the paste function you have above will not work, which most likely throws the error.
Please do dput(head(output))
and paste the output, this allows us to see the column names and why it doesn't work.
Try something below,according to how you describe the column names:
#simulate data
output=data.frame(
"growth_rate"=rnorm(100),
matrix(rnorm(100*69*2),nrow=100)
)
colnames(output)[-1] = c(paste("id",1:69,".PRE",sep=""),paste("id",1:69,".POST",sep=""))
output$year = 1901:2000
output$country = sample(letters,nrow(output),replace=TRUE)
n=69
#create list to hold models
models = vector("list",n)
for(i in 1:n) {
vars = paste0("id",i,c(".PRE", ".POST"))
# i think it works without as.formula, but better to be safe
FORMULA = as.formula(paste("growth_rate ~ ", paste(vars, collapse=" + ")))
models[[i]] = lm(FORMULA,data = output)
}
If you want to include other variables:
for(i in 1:n) {
vars = paste0("id",i,c(".PRE", ".POST"))
# add other variables
vars = c(vars,"country","year")
FORMULA = paste("growth_rate ~ ", paste(vars, collapse=" + "))
models[[i]] = lm(FORMULA,data = output)
}
Upvotes: 1