lm() datframes and for-loop

Question

I have dataframe DFMyDataBase :

DATE          AUDCAD      AUDCHF       AUDJPY     AUDNZD      (...)
05/01/2017  0.965960    0.742230    85.315000   1.048500      (...)
08/01/2017  0.971760    0.746410    85.353000   1.048140      (...)
09/01/2017  0.975070    0.749300    85.307000   1.054290      (...)
10/01/2017  0.980720    0.754540    85.873000   1.054380      (...)
11/01/2017  0.983750    0.756540    85.861000   1.053650      (...)
12/01/2017  0.983320    0.756070    85.822000   1.051750      (...)
(...)

And dataframe DFLM:

FirstSymbol     SecondSymbol    PValue     DickeyFullerCV
     AUDCAD           AUDCHF
     AUDCAD           AUDJPY
     AUDCAD           AUDNZD
     AUDCAD           AUDUSD
      (...)            (...)

build lm() based on the pairs of names stored in DFLM, which represent the column names in DFMyDataBase. The first column of DFLM represent the dependent variable and the second column the independent variable.

Allan Cameron · Accepted Answer

It sounds like you wish to build formulas based on the pairs of names stored in DFLM, which represent the column names in DFMyDataBase, then use these formulae as the basis for running lm on each pair. I am further guessing that you want the first column of DFLMto represent the dependent variable and the second column to be the independent variable.

That being the case, you could do something like

f_list <- apply(DFLM, 1, function(x) as.formula(paste(x[1], "~", x[2])))

models <- lapply(f_list, function(x) {
  eval(call("lm", formula = x, data = quote(DFMyDataBase)))
  })

So now models is a list of lm objects, one for each row of DFLM, allowing you to do:

models[[1]]
#> 
#> Call:
#> lm(formula = AUDCAD ~ AUDCHF, data = DFMyDataBase)
#> 
#> Coefficients:
#> (Intercept)       AUDCHF  
#>     0.06269      1.21739

or

summary(models[[3]])
#> 
#> Call:
#> lm(formula = AUDCHF ~ AUDNZD, data = DFMyDataBase)
#> 
#> Residuals:
#>          1          2          3          4          5          6 
#> -0.0037091  0.0010088 -0.0052919 -0.0001864  0.0029046  0.0052740 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  -0.8210     0.7342  -1.118    0.326  
#> AUDNZD        1.4944     0.6980   2.141    0.099 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.004446 on 4 degrees of freedom
#> Multiple R-squared:  0.534,  Adjusted R-squared:  0.4175 
#> F-statistic: 4.583 on 1 and 4 DF,  p-value: 0.09899

and so on.

Note that in your sample data frame your fourth row contains the same column name twice - it's not clear what you intend for this type of situation. I have altered the input a bit as shown below.

Data

DFMyDataBase <- structure(list(DATE = c("05/01/2017", "08/01/2017",
"09/01/2017", "10/01/2017", "11/01/2017", "12/01/2017"), AUDCAD = c(0.96596, 
0.97176, 0.97507, 0.98072, 0.98375, 0.98332), AUDCHF = c(0.74223, 
0.74641, 0.7493, 0.75454, 0.75654, 0.75607), AUDJPY = c(85.315, 
85.353, 85.307, 85.873, 85.861, 85.822), AUDNZD = c(1.0485, 1.04814, 
1.05429, 1.05438, 1.05365, 1.05175)), class = "data.frame", row.names = c(NA, 
-6L))

DFLM <- structure(list(FirstSymbol = c("AUDCAD", "AUDCAD", "AUDCHF"), 
    SecondSymbol = c("AUDCHF", "AUDJPY", "AUDNZD")), row.names = c(NA, 
3L), class = "data.frame")

DFMyDataBase
#>         DATE  AUDCAD  AUDCHF AUDJPY  AUDNZD
#> 1 05/01/2017 0.96596 0.74223 85.315 1.04850
#> 2 08/01/2017 0.97176 0.74641 85.353 1.04814
#> 3 09/01/2017 0.97507 0.74930 85.307 1.05429
#> 4 10/01/2017 0.98072 0.75454 85.873 1.05438
#> 5 11/01/2017 0.98375 0.75654 85.861 1.05365
#> 6 12/01/2017 0.98332 0.75607 85.822 1.05175

DFLM
#>   FirstSymbol SecondSymbol
#> 1      AUDCAD       AUDCHF
#> 2      AUDCAD       AUDJPY
#> 3      AUDCHF       AUDNZD

lm() datframes and for-loop

Answers (1)

Related Questions