Reputation: 17
I am taking an online edX course on interrupted time series analysis that makes use of R and part of the course shows us how to derive predicted values from the gls model as well as get the absolute and relative change of the predicted vs the counterfactual.
Unfortunately, there is no example of how to get 95% confidence intervals around these predicted changes. On the course discussion board, the instructor linked to this article (Zhang et al, 2009.) where the authors provide SAS code, linked at the end of the 'Methods' section, to get these CIs, but the instructor does not have code that implements this in R.
Someone suggested I could use the R marginaleffects
package to do this. I am trying to apply the materials from this course to a work project, but this one piece of getting CIs around the pred, cfac, and absolute/relative changes (pred - cfac) is stumping me.
Just since I can't share my actual data from work, here's the example data the course gives us (csv), "flow" is the outcome, as in water flow of a river being studied:
year,flow,time,level,trend
1871,3533.916667,1,0,0
1872,3664.25,2,0,0
1873,3047.5,3,0,0
1874,3808.583333,4,0,0
1875,3677.5,5,0,0
1876,3663.416667,6,0,0
1877,2667.583333,7,0,0
1878,3900.083333,8,0,0
1879,4326.666667,9,0,0
1880,3590.583333,10,0,0
1881,3147.25,11,0,0
1882,2954.083333,12,0,0
1883,3498.416667,13,0,0
1884,3142.166667,14,0,0
1885,3213.916667,15,0,0
1886,3035,16,0,0
1887,3711.75,17,0,0
1888,2521.666667,18,0,0
1889,3023.5,19,0,0
1890,3507.5,20,0,0
1891,3493.666667,21,0,0
1892,4268.75,22,0,0
1893,3635.583333,23,0,0
1894,3946.416667,24,0,0
1895,3978.166667,25,0,0
1896,3856,26,0,0
1897,3248.166667,27,0,0
1898,3479.833333,28,1,1
1899,2453,29,1,2
1900,2649.166667,30,1,3
1901,2764.916667,31,1,4
1902,2196.666667,32,1,5
1903,2973.583333,33,1,6
1904,2613.416667,34,1,7
1905,2219.333333,35,1,8
1906,2901,36,1,9
1907,2186.75,37,1,10
1908,3215.333333,38,1,11
1909,3318.25,39,1,12
1910,3063.666667,40,1,13
1911,2605.583333,41,1,14
1912,2239.75,42,1,15
1913,1441.25,43,1,16
1914,2654.166667,44,1,17
1915,2191.75,45,1,18
1916,3570.333333,46,1,19
1917,3498.166667,47,1,20
1918,2564.583333,48,1,21
1919,2434.166667,49,1,22
1920,2594.333333,50,1,23
1921,2415.166667,51,1,24
1922,2683.583333,52,1,25
1923,2733.416667,53,1,26
1924,2729.416667,54,1,27
1925,2138.083333,55,1,28
1926,2665.916667,56,1,29
1927,2306.25,57,1,30
1928,2495.416667,58,1,31
1929,3241.083333,59,1,32
1930,2334.416667,60,1,33
This is what the model looks like, the instructor uses the gls function from the nlme
package:
# Fit the GLS regression model
model_p10 <- gls(flow ~ time + level + trend,
data=data,
correlation=corARMA(p=10,form=~time),
method="ML")
summary(model_p10)
Output:
Coefficients:
Value p-value
(Intercept) 3348.033 0.0000
time 6.745 0.2871
level -738.245 0.0000
trend -14.210 0.0397
This is what the model is doing graphically:
That predicted changes code is then ran to calculate the predicted value 25 years AFTER the "weather change" i.e. the interruption, indicated by the level variable in the data which flips from 0 to 1 at time=28, thus there are 27 years before this "weather change"/interruption, 27 years BEFORE the weather change + 25 years AFTER the weather change = 52 year.
To clarify, the idea of the counterfactual here is not simply that level=0, but that the post-interruption change in level and trend/slope did not occur (i.e. the level and trend coefficients that encode the change in level and trend after the interruption should not be included in the counterfactual calculation).
# Predicted value at 25 years after the weather change
pred <- fitted(model_p10)[52]
# Then estimate the counterfactual at the same time point
cfac <- model_p10$coef[1] + model_p10$coef[2]*52
# Absolute change at 25 years
pred - cfac
# Relative change at 25 years
(pred - cfac) / cfac
Output:
> # Absolute change at 25 years
> pred - cfac
52
-1093.483
> # Relative change at 25 years
> (pred - cfac) / cfac
52
-0.2956351
Questions:
I see that the marginaleffects
package's predictions( ) function will give me the pred values and 95% CIs around them. How would I then also get CIs for the counterfactual (cfac)?
And how would I specify the comparisons( ) function to get CIs for the absolute and relative (i.e. %) change between the predicted vs counterfactual (i.e. pred - cfac)? When I tried it out it kept giving me the same difference no matter the time point I input.
I did the following:
library(marginaleffects)
comparisons(model_orig_q1, variables = "level", newdata = datagrid(time = 52))
marginaleffects
package? So instead of saying the trend in water flow changed by -14.2 (-10, -21), I could instead say it decreased by 10% (5%, 15%) or whatever it'd be.Upvotes: 0
Views: 79
Reputation: 17823
First off, you are not calculating the counterfactual prediction correctly, since your multiplying by 52, rather than by the variable value in the 52nd row.
library(nlme)
library(marginaleffects)
data = read.csv("flow.csv")
model_p10 <- gls(flow ~ time + level + trend,
data = data,
correlation = corARMA(p = 10, form = ~time),
method = "ML")
Factual and counterfactual data
d_fact <- data[52, , drop = FALSE]
d_cfac <- transform(d_fact, level = 0)
Factual and counterfactual data
p_fact <- predictions(model_p10, newdata = d_fact)
p_cfac <- predictions(model_p10, newdata = d_cfac)
p_fact
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % year time level
#> 2605 48.1 54.2 <0.001 Inf 2511 2699 1922 52 1
#> trend
#> 25
p_cfac
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % year time level
#> 3344 154 21.7 <0.001 344.6 3042 3645 1922 52 0
#> trend
#> 25
Counterfactual comparison
comparisons(model_p10,
variables = "level",
newdata = d_fact)
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % year time trend
#> -738 133 -5.56 <0.001 25.2 -998 -478 1922 52 25
# Equivalent to:
p_fact$estimate - p_cfac$estimate
#> [1] -738.2451
Ratio instead of difference
comparisons(model_p10,
comparison = "ratio",
variables = "level",
newdata = d_fact)
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % year time trend
#> 0.779 0.0301 25.9 <0.001 487.3 0.72 0.838 1922 52 25
# Equivalent to:
p_fact$estimate / p_cfac$estimate
#> [1] 0.7792013
Upvotes: 1