Pranav Garg
Pranav Garg

Reputation: 31

Referencing element in a loop in R

I would like to know how to refer to elements in a loop in R. In STATA, it is done through `var' inside a loop. I am working with loops and I want to refer to the variables inside each loop while regressing these variables on a list of variables (x1 x2 x3). x1 variable also has suffixes so that the name can be split into several shorter parts. The code I would make in STATA would be:

foreach credit in "short_term" "medium_term" "long_term" {
    foreach percentile in "p50" "p75" "p90" {
        foreach type in "high4" "high5" "high6" {
            reg y_-credit' x1_-percentile '_`type' x2 x3 
        }
    }
} 

In R, if I create a list and make a loop, how do I refer to each element in the list? For instance:

credit <- c("short_term","medium_term","long_term") 
percentile <- c("p50","p75","p90") 
type <- c("high4","high5","high6") 

for (c in credit) {
    for (p in percentile) {
        for (t in type) {
            baseline_[c]_[p]_[t] <- lm(y_[c] — xl_[p]_[t] + x2 + x3)
        } 
     }
}

And then get a .txt file using sink to get all results (summary(baseline) for all baselines) together.

I hope my illustration was adequate in explaining my doubt. I am struggling with loops because of this (minor - when compared to STATA's `var') issue.

I await your response.

Thank you, Pranav

Upvotes: 0

Views: 1010

Answers (1)

Len Greski
Len Greski

Reputation: 10855

One can use the formula() function to generate formulas from strings in R.

Since the OP isn't reproducible, we'll demonstrate formula() by using the mtcars data set:

data(mtcars) # Use motor trend cars data set
dvs <- c("mpg","qsec")
ivs <- c("am","wt","disp")
for(d in dvs){
     for(i in ivs){
          message(paste("d is: ", d, "i is: ",i))
          print(summary(lm(formula(paste(d,"~",i)),mtcars)))
     }
}

...and the first part of the output:

> for(d in dvs){
+      for(i in ivs){
+           message(paste("d is: ", d, "i is: ",i))
+           print(summary(lm(formula(paste(d,"~",i)),mtcars)))
+      }
+ }
d is:  mpg i is:  am

Call:
lm(formula = formula(paste(d, "~", i)), data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.3923 -3.0923 -0.2974  3.2439  9.5077 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   17.147      1.125  15.247 1.13e-15 ***
am             7.245      1.764   4.106 0.000285 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.902 on 30 degrees of freedom
Multiple R-squared:  0.3598,    Adjusted R-squared:  0.3385 
F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Since the output from lm() can be saved in an object, one can also generate a list() of model objects, and manipulate them further in R.

To generate named variables for the formula() statement from vectors containing elements of the desired variable names, one can use the paste() or paste0() functions in a manner similar to the approach taken above with the mtcars data set. paste0() defaults to no spaces between arguments, where as paste() defaults to adding space between the arguments.

Again, making some guesses as to the actual intended formulae, we'll use the OP nested for() loops to generate strings that can be used with formula() in an lm() function.

# 
# generate formulas using content from OP
# 
credit <- c("short_term","medium_term","long_term") 
percentile <- c("p50","p75","p90") 
type <- c("high4","high5","high6") 

for (c in credit) {
     for (p in percentile) {
          for (t in type) {
               aFormula <- paste0("y_",c," ~ x1-",p,"_",t," + x2 + x3")
               print(aFormula)
          } 
     }
}

...and the start of the output:

> credit <- c("short_term","medium_term","long_term") 
> percentile <- c("p50","p75","p90") 
> type <- c("high4","high5","high6") 
> 
> for (c in credit) {
+      for (p in percentile) {
+           for (t in type) {
+                aFormula <- paste0("y_",c," ~ x1_",p,"_",t," + x2 + x3")
+                print(aFormula)
+           } 
+      }
+ }
[1] "y_short_term ~ x1_p50_high4 + x2 + x3"
[1] "y_short_term ~ x1_p50_high5 + x2 + x3"
[1] "y_short_term ~ x1_p50_high6 + x2 + x3"
[1] "y_short_term ~ x1_p75_high4 + x2 + x3"
[1] "y_short_term ~ x1_p75_high5 + x2 + x3"
[1] "y_short_term ~ x1_p75_high6 + x2 + x3"
[1] "y_short_term ~ x1_p90_high4 + x2 + x3"
[1] "y_short_term ~ x1_p90_high5 + x2 + x3"
[1] "y_short_term ~ x1_p90_high6 + x2 + x3"
[1] "y_medium_term ~ x1_p50_high4 + x2 + x3"
[1] "y_medium_term ~ x1_p50_high5 + x2 + x3"
[1] "y_medium_term ~ x1_p50_high6 + x2 + x3"
. 
. 
. 

Note that the content in the OP inconsistently uses - vs. _, so I used _ at all relevant spots in the formulae.

Upvotes: 1

Related Questions