Reputation: 31
I would like to know how to refer to elements in a loop in R. In STATA, it is done through `var' inside a loop. I am working with loops and I want to refer to the variables inside each loop while regressing these variables on a list of variables (x1 x2 x3). x1 variable also has suffixes so that the name can be split into several shorter parts. The code I would make in STATA would be:
foreach credit in "short_term" "medium_term" "long_term" {
foreach percentile in "p50" "p75" "p90" {
foreach type in "high4" "high5" "high6" {
reg y_-credit' x1_-percentile '_`type' x2 x3
}
}
}
In R, if I create a list and make a loop, how do I refer to each element in the list? For instance:
credit <- c("short_term","medium_term","long_term")
percentile <- c("p50","p75","p90")
type <- c("high4","high5","high6")
for (c in credit) {
for (p in percentile) {
for (t in type) {
baseline_[c]_[p]_[t] <- lm(y_[c] — xl_[p]_[t] + x2 + x3)
}
}
}
And then get a .txt file using sink to get all results (summary(baseline) for all baselines) together.
I hope my illustration was adequate in explaining my doubt. I am struggling with loops because of this (minor - when compared to STATA's `var') issue.
I await your response.
Thank you, Pranav
Upvotes: 0
Views: 1010
Reputation: 10855
One can use the formula()
function to generate formulas from strings in R.
Since the OP isn't reproducible, we'll demonstrate formula()
by using the mtcars
data set:
data(mtcars) # Use motor trend cars data set
dvs <- c("mpg","qsec")
ivs <- c("am","wt","disp")
for(d in dvs){
for(i in ivs){
message(paste("d is: ", d, "i is: ",i))
print(summary(lm(formula(paste(d,"~",i)),mtcars)))
}
}
...and the first part of the output:
> for(d in dvs){
+ for(i in ivs){
+ message(paste("d is: ", d, "i is: ",i))
+ print(summary(lm(formula(paste(d,"~",i)),mtcars)))
+ }
+ }
d is: mpg i is: am
Call:
lm(formula = formula(paste(d, "~", i)), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-9.3923 -3.0923 -0.2974 3.2439 9.5077
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.147 1.125 15.247 1.13e-15 ***
am 7.245 1.764 4.106 0.000285 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.902 on 30 degrees of freedom
Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
Since the output from lm()
can be saved in an object, one can also generate a list()
of model objects, and manipulate them further in R.
To generate named variables for the formula()
statement from vectors containing elements of the desired variable names, one can use the paste()
or paste0()
functions in a manner similar to the approach taken above with the mtcars
data set. paste0()
defaults to no spaces between arguments, where as paste()
defaults to adding space between the arguments.
Again, making some guesses as to the actual intended formulae, we'll use the OP nested for()
loops to generate strings that can be used with formula()
in an lm()
function.
#
# generate formulas using content from OP
#
credit <- c("short_term","medium_term","long_term")
percentile <- c("p50","p75","p90")
type <- c("high4","high5","high6")
for (c in credit) {
for (p in percentile) {
for (t in type) {
aFormula <- paste0("y_",c," ~ x1-",p,"_",t," + x2 + x3")
print(aFormula)
}
}
}
...and the start of the output:
> credit <- c("short_term","medium_term","long_term")
> percentile <- c("p50","p75","p90")
> type <- c("high4","high5","high6")
>
> for (c in credit) {
+ for (p in percentile) {
+ for (t in type) {
+ aFormula <- paste0("y_",c," ~ x1_",p,"_",t," + x2 + x3")
+ print(aFormula)
+ }
+ }
+ }
[1] "y_short_term ~ x1_p50_high4 + x2 + x3"
[1] "y_short_term ~ x1_p50_high5 + x2 + x3"
[1] "y_short_term ~ x1_p50_high6 + x2 + x3"
[1] "y_short_term ~ x1_p75_high4 + x2 + x3"
[1] "y_short_term ~ x1_p75_high5 + x2 + x3"
[1] "y_short_term ~ x1_p75_high6 + x2 + x3"
[1] "y_short_term ~ x1_p90_high4 + x2 + x3"
[1] "y_short_term ~ x1_p90_high5 + x2 + x3"
[1] "y_short_term ~ x1_p90_high6 + x2 + x3"
[1] "y_medium_term ~ x1_p50_high4 + x2 + x3"
[1] "y_medium_term ~ x1_p50_high5 + x2 + x3"
[1] "y_medium_term ~ x1_p50_high6 + x2 + x3"
.
.
.
Note that the content in the OP inconsistently uses -
vs. _
, so I used _
at all relevant spots in the formulae.
Upvotes: 1