Reputation: 19
I have dataframe DFMyDataBase
:
DATE AUDCAD AUDCHF AUDJPY AUDNZD (...)
05/01/2017 0.965960 0.742230 85.315000 1.048500 (...)
08/01/2017 0.971760 0.746410 85.353000 1.048140 (...)
09/01/2017 0.975070 0.749300 85.307000 1.054290 (...)
10/01/2017 0.980720 0.754540 85.873000 1.054380 (...)
11/01/2017 0.983750 0.756540 85.861000 1.053650 (...)
12/01/2017 0.983320 0.756070 85.822000 1.051750 (...)
(...)
And dataframe DFLM
:
FirstSymbol SecondSymbol PValue DickeyFullerCV
AUDCAD AUDCHF
AUDCAD AUDJPY
AUDCAD AUDNZD
AUDCAD AUDUSD
(...) (...)
build lm() based on the pairs of names stored in DFLM
, which represent the column names in DFMyDataBase
. The first column of DFLM
represent the dependent variable and the second column the independent variable.
Upvotes: 0
Views: 57
Reputation: 173858
It sounds like you wish to build formulas based on the pairs of names stored in DFLM
, which represent the column names in DFMyDataBase
, then use these formulae as the basis for running lm
on each pair. I am further guessing that you want the first column of DFLM
to represent the dependent variable and the second column to be the independent variable.
That being the case, you could do something like
f_list <- apply(DFLM, 1, function(x) as.formula(paste(x[1], "~", x[2])))
models <- lapply(f_list, function(x) {
eval(call("lm", formula = x, data = quote(DFMyDataBase)))
})
So now models
is a list of lm
objects, one for each row of DFLM
, allowing you to do:
models[[1]]
#>
#> Call:
#> lm(formula = AUDCAD ~ AUDCHF, data = DFMyDataBase)
#>
#> Coefficients:
#> (Intercept) AUDCHF
#> 0.06269 1.21739
or
summary(models[[3]])
#>
#> Call:
#> lm(formula = AUDCHF ~ AUDNZD, data = DFMyDataBase)
#>
#> Residuals:
#> 1 2 3 4 5 6
#> -0.0037091 0.0010088 -0.0052919 -0.0001864 0.0029046 0.0052740
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.8210 0.7342 -1.118 0.326
#> AUDNZD 1.4944 0.6980 2.141 0.099 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.004446 on 4 degrees of freedom
#> Multiple R-squared: 0.534, Adjusted R-squared: 0.4175
#> F-statistic: 4.583 on 1 and 4 DF, p-value: 0.09899
and so on.
Note that in your sample data frame your fourth row contains the same column name twice - it's not clear what you intend for this type of situation. I have altered the input a bit as shown below.
Data
DFMyDataBase <- structure(list(DATE = c("05/01/2017", "08/01/2017",
"09/01/2017", "10/01/2017", "11/01/2017", "12/01/2017"), AUDCAD = c(0.96596,
0.97176, 0.97507, 0.98072, 0.98375, 0.98332), AUDCHF = c(0.74223,
0.74641, 0.7493, 0.75454, 0.75654, 0.75607), AUDJPY = c(85.315,
85.353, 85.307, 85.873, 85.861, 85.822), AUDNZD = c(1.0485, 1.04814,
1.05429, 1.05438, 1.05365, 1.05175)), class = "data.frame", row.names = c(NA,
-6L))
DFLM <- structure(list(FirstSymbol = c("AUDCAD", "AUDCAD", "AUDCHF"),
SecondSymbol = c("AUDCHF", "AUDJPY", "AUDNZD")), row.names = c(NA,
3L), class = "data.frame")
DFMyDataBase
#> DATE AUDCAD AUDCHF AUDJPY AUDNZD
#> 1 05/01/2017 0.96596 0.74223 85.315 1.04850
#> 2 08/01/2017 0.97176 0.74641 85.353 1.04814
#> 3 09/01/2017 0.97507 0.74930 85.307 1.05429
#> 4 10/01/2017 0.98072 0.75454 85.873 1.05438
#> 5 11/01/2017 0.98375 0.75654 85.861 1.05365
#> 6 12/01/2017 0.98332 0.75607 85.822 1.05175
DFLM
#> FirstSymbol SecondSymbol
#> 1 AUDCAD AUDCHF
#> 2 AUDCAD AUDJPY
#> 3 AUDCHF AUDNZD
Upvotes: 2