Michael B
Michael B

Reputation: 257

R: Handling subsets using dynlm

I want to compute the following two regressions using R:

library("dynlm")

zooX = zoo(test[, -1])
lmx <- dynlm(d(Euribor3)~d(Ois3)+d(CDS)+d(Vstoxx)+d(log(omo))+d(L(Euribor3, 1)), data=zooX[1:16])
summary(lmx)

zooX = zoo(test[, -1])
lmx <- dynlm(d(Euribor3)~d(Ois3)+d(CDS)+d(Vstoxx)+d(log(omo))+d(L(Euribor3, 1)), data=zooX[17:31])
summary(lmx)

The only difference between those two models is the subset (the first[1:16] and the second [17:31]). Now these two models give me the following output:

Time series regression with "zoo" data:
Start = 3, End = 16

Call:
dynlm(formula = d(Euribor3) ~ d(Ois3) + d(CDS) + d(Vstoxx) + 
    d(log(omo)) + d(L(Euribor3, 1)), data = zooX[1:16])

and

Time series regression with "zoo" data:
Start = 19, End = 31

Call:
dynlm(formula = d(Euribor3) ~ d(Ois3) + d(CDS) + d(Vstoxx) + 
    d(log(omo)) + d(L(Euribor3, 1)), data = zooX[17:31])

As you can see from the output(coefficients, t value, etc. are not reported) the first model uses the observations [3:16] (since two are lost because of differencing the variables and taking the first lag of the dependent variable). The second model uses the observations [19:31] (again two observations are lost). Because of the transformation of the variables there arises a gap between the two subsets, i.e. the first model has the End=16 and the second model the Start=19, which means that in the regression the observations 17 and 18 are not "included". Now my question is what makes more sense? Either let the subsets in the model as they are or shift back the subset of the second model by two observations (i.e. to [15:31]), in order to close the gap? The output then would become:

Time series regression with "zoo" data:
Start = 17, End = 31

Call:
dynlm(formula = d(Euribor3) ~ d(Ois3) + d(CDS) + d(Vstoxx) + 
    d(log(omo)) + d(L(Euribor3, 1)), data = zooX[15:31])

As you can see the first model still has the End=16 and the second model now has the Start=17. Many Thanks!

Here is my data set:

Date    Euribor3    Ois3    Vstoxx  CDS omo
03.01.2005  2.154   2.089   14.47   17.938  344999
04.01.2005  2.151   2.084   14.51   17.886  344999
05.01.2005  2.151   2.087   14.42   17.95   333998
06.01.2005  2.15    2.085   13.8    17.95   333998
07.01.2005  2.146   2.086   13.57   17.913  333998
10.01.2005  2.146   2.087   12.92   17.958  333998
11.01.2005  2.146   2.089   13.68   17.962  333998
12.01.2005  2.145   2.085   14.05   17.886  339999
13.01.2005  2.144   2.084   13.64   17.568  339999
14.01.2005  2.144   2.085   13.57   17.471  339999
17.01.2005  2.143   2.085   13.2    17.365  339999
18.01.2005  2.144   2.085   13.17   17.214  347999
19.01.2005  2.143   2.086   13.63   17.143  354499
20.01.2005  2.144   2.087   14.17   17.125  354499
21.01.2005  2.143   2.087   13.96   17.193  354499
24.01.2005  2.143   2.086   14.11   17.283  354499
25.01.2005  2.144   2.086   13.63   17.083  354499
26.01.2005  2.143   2.086   13.32   17.348  347999
27.01.2005  2.144   2.085   12.46   17.295  352998
28.01.2005  2.144   2.084   12.81   17.219  352998
31.01.2005  2.142   2.084   12.72   17.143  352998
01.02.2005  2.142   2.083   12.36   17.125  352998
02.02.2005  2.141   2.083   12.25   17  357499
03.02.2005  2.144   2.088   12.38   16.808  357499
04.02.2005  2.142   2.084   11.6    16.817  357499
07.02.2005  2.142   2.084   11.99   16.798  359999
08.02.2005  2.141   2.083   11.92   16.804  355500
09.02.2005  2.142   2.08    12.19   16.589  355500
10.02.2005  2.14    2.08    12.04   16.5    355500
11.02.2005  2.14    2.078   11.99   16.429  355500
14.02.2005  2.139   2.078   12.52   16.042  355500

Upvotes: 0

Views: 483

Answers (1)

Achim Zeileis
Achim Zeileis

Reputation: 17193

If you do the subsetting yourself via data = zooX[...,], then dynlm() doesn't see the full sample and hence has to lose two observations. If you supply the full data = zooX and then set end = 14 and start = 15 respectively, then dynlm() can first put together the full model frame with all lags/differences etc. and subsequently choose the desired subset. In the example below I do this but use the proper date index that is available in your data.

First, we read the data into a zoo series with proper Date index:

library("dynlm")
zooX <- read.zoo(textConnection("Date    Euribor3    Ois3    Vstoxx  CDS omo
03.01.2005  2.154   2.089   14.47   17.938  344999
04.01.2005  2.151   2.084   14.51   17.886  344999
05.01.2005  2.151   2.087   14.42   17.95   333998
06.01.2005  2.15    2.085   13.8    17.95   333998
07.01.2005  2.146   2.086   13.57   17.913  333998
10.01.2005  2.146   2.087   12.92   17.958  333998
11.01.2005  2.146   2.089   13.68   17.962  333998
12.01.2005  2.145   2.085   14.05   17.886  339999
13.01.2005  2.144   2.084   13.64   17.568  339999
14.01.2005  2.144   2.085   13.57   17.471  339999
17.01.2005  2.143   2.085   13.2    17.365  339999
18.01.2005  2.144   2.085   13.17   17.214  347999
19.01.2005  2.143   2.086   13.63   17.143  354499
20.01.2005  2.144   2.087   14.17   17.125  354499
21.01.2005  2.143   2.087   13.96   17.193  354499
24.01.2005  2.143   2.086   14.11   17.283  354499
25.01.2005  2.144   2.086   13.63   17.083  354499
26.01.2005  2.143   2.086   13.32   17.348  347999
27.01.2005  2.144   2.085   12.46   17.295  352998
28.01.2005  2.144   2.084   12.81   17.219  352998
31.01.2005  2.142   2.084   12.72   17.143  352998
01.02.2005  2.142   2.083   12.36   17.125  352998
02.02.2005  2.141   2.083   12.25   17  357499
03.02.2005  2.144   2.088   12.38   16.808  357499
04.02.2005  2.142   2.084   11.6    16.817  357499
07.02.2005  2.142   2.084   11.99   16.798  359999
08.02.2005  2.141   2.083   11.92   16.804  355500
09.02.2005  2.142   2.08    12.19   16.589  355500
10.02.2005  2.14    2.08    12.04   16.5    355500
11.02.2005  2.14    2.078   11.99   16.429  355500
14.02.2005  2.139   2.078   12.52   16.042  355500
"), header = TRUE, format = "%d.%m.%Y")

Then, we select the time index of the observation in the middle. You can either do this manually or based on the zooX series:

mid <- as.Date("2005-01-24")
mid <- time(zooX)[ceiling(nrow(zooX)/2)]

In either case mid now represents the middle time index 2005-01-24. And then we can set up our models:

f <- d(Euribor3) ~ d(Ois3) + d(CDS) + d(Vstoxx) + d(log(omo)) + d(L(Euribor3))
m1 <- dynlm(f, data = zooX, end = mid)
m2 <- dynlm(f, data = zooX, start = mid + 1)

The first one now runs up to 2005-01-24:

Time series regression with "zoo" data:
Start = 2005-01-05, End = 2005-01-24

Call:
dynlm(formula = f, data = zooX, end = mid)

Coefficients:
   (Intercept)         d(Ois3)          d(CDS)       d(Vstoxx)     d(log(omo))  
    -0.0008859      -0.0125676       0.0001073       0.0012116       0.0124502  
d(L(Euribor3))  
    -0.4217354  

And the second one starts on 2005-01-25:

Time series regression with "zoo" data:
Start = 2005-01-25, End = 2005-02-14

Call:
dynlm(formula = f, data = zooX, start = mid + 1)

Coefficients:
   (Intercept)         d(Ois3)          d(CDS)       d(Vstoxx)     d(log(omo))  
    -0.0005556       0.2565964      -0.0027670      -0.0009140      -0.0152427  
d(L(Euribor3))  
    -0.5143080  

However, I'm not sure whether I would recommend to fit a model with 6 regression coefficients to only 14 or 15 observations...

Upvotes: 1

Related Questions