M. Yates
M. Yates

Reputation: 17

R marginaleffects package: Estimate 95% CI for predicted minus counterfactual absolute/relative changes with an interrupted time series

I am taking an online edX course on interrupted time series analysis that makes use of R and part of the course shows us how to derive predicted values from the gls model as well as get the absolute and relative change of the predicted vs the counterfactual.

Unfortunately, there is no example of how to get 95% confidence intervals around these predicted changes. On the course discussion board, the instructor linked to this article (Zhang et al, 2009.) where the authors provide SAS code, linked at the end of the 'Methods' section, to get these CIs, but the instructor does not have code that implements this in R.

Someone suggested I could use the R marginaleffects package to do this. I am trying to apply the materials from this course to a work project, but this one piece of getting CIs around the pred, cfac, and absolute/relative changes (pred - cfac) is stumping me. Just since I can't share my actual data from work, here's the example data the course gives us (csv), "flow" is the outcome, as in water flow of a river being studied:

year,flow,time,level,trend
1871,3533.916667,1,0,0
1872,3664.25,2,0,0
1873,3047.5,3,0,0
1874,3808.583333,4,0,0
1875,3677.5,5,0,0
1876,3663.416667,6,0,0
1877,2667.583333,7,0,0
1878,3900.083333,8,0,0
1879,4326.666667,9,0,0
1880,3590.583333,10,0,0
1881,3147.25,11,0,0
1882,2954.083333,12,0,0
1883,3498.416667,13,0,0
1884,3142.166667,14,0,0
1885,3213.916667,15,0,0
1886,3035,16,0,0
1887,3711.75,17,0,0
1888,2521.666667,18,0,0
1889,3023.5,19,0,0
1890,3507.5,20,0,0
1891,3493.666667,21,0,0
1892,4268.75,22,0,0
1893,3635.583333,23,0,0
1894,3946.416667,24,0,0
1895,3978.166667,25,0,0
1896,3856,26,0,0
1897,3248.166667,27,0,0
1898,3479.833333,28,1,1
1899,2453,29,1,2
1900,2649.166667,30,1,3
1901,2764.916667,31,1,4
1902,2196.666667,32,1,5
1903,2973.583333,33,1,6
1904,2613.416667,34,1,7
1905,2219.333333,35,1,8
1906,2901,36,1,9
1907,2186.75,37,1,10
1908,3215.333333,38,1,11
1909,3318.25,39,1,12
1910,3063.666667,40,1,13
1911,2605.583333,41,1,14
1912,2239.75,42,1,15
1913,1441.25,43,1,16
1914,2654.166667,44,1,17
1915,2191.75,45,1,18
1916,3570.333333,46,1,19
1917,3498.166667,47,1,20
1918,2564.583333,48,1,21
1919,2434.166667,49,1,22
1920,2594.333333,50,1,23
1921,2415.166667,51,1,24
1922,2683.583333,52,1,25
1923,2733.416667,53,1,26
1924,2729.416667,54,1,27
1925,2138.083333,55,1,28
1926,2665.916667,56,1,29
1927,2306.25,57,1,30
1928,2495.416667,58,1,31
1929,3241.083333,59,1,32
1930,2334.416667,60,1,33

This is what the model looks like, the instructor uses the gls function from the nlme package:

# Fit the GLS regression model
model_p10 <- gls(flow ~ time + level + trend,
    data=data,
    correlation=corARMA(p=10,form=~time),
    method="ML")

summary(model_p10)

Output:

Coefficients:

                 Value         p-value
(Intercept)      3348.033      0.0000
time                6.745      0.2871
level            -738.245      0.0000
trend             -14.210      0.0397

This is what the model is doing graphically:

enter image description here

That predicted changes code is then ran to calculate the predicted value 25 years AFTER the "weather change" i.e. the interruption, indicated by the level variable in the data which flips from 0 to 1 at time=28, thus there are 27 years before this "weather change"/interruption, 27 years BEFORE the weather change + 25 years AFTER the weather change = 52 year.

To clarify, the idea of the counterfactual here is not simply that level=0, but that the post-interruption change in level and trend/slope did not occur (i.e. the level and trend coefficients that encode the change in level and trend after the interruption should not be included in the counterfactual calculation).

# Predicted value at 25 years after the weather change
pred <- fitted(model_p10)[52]
# Then estimate the counterfactual at the same time point
cfac <- model_p10$coef[1] + model_p10$coef[2]*52
# Absolute change at 25 years
pred - cfac
# Relative change at 25 years
(pred - cfac) / cfac

Output:

> # Absolute change at 25 years
> pred - cfac
52
-1093.483

> # Relative change at 25 years
> (pred - cfac) / cfac
52
-0.2956351

Questions:

  1. I see that the marginaleffects package's predictions( ) function will give me the pred values and 95% CIs around them. How would I then also get CIs for the counterfactual (cfac)?

  2. And how would I specify the comparisons( ) function to get CIs for the absolute and relative (i.e. %) change between the predicted vs counterfactual (i.e. pred - cfac)? When I tried it out it kept giving me the same difference no matter the time point I input.

I did the following:

library(marginaleffects) 
comparisons(model_orig_q1, variables = "level",   newdata = datagrid(time = 52))
  1. Also, this is a bit unrelated, but is there a way to get level and trend coefficients in the model, which currently give the absolute differences between the pre vs post interruption level and trend respectively, as a relative percentage with 95% CIs using the marginaleffects package? So instead of saying the trend in water flow changed by -14.2 (-10, -21), I could instead say it decreased by 10% (5%, 15%) or whatever it'd be.

Upvotes: 0

Views: 79

Answers (1)

Vincent
Vincent

Reputation: 17823

First off, you are not calculating the counterfactual prediction correctly, since your multiplying by 52, rather than by the variable value in the 52nd row.

library(nlme)
library(marginaleffects)

data = read.csv("flow.csv")
model_p10 <- gls(flow ~ time + level + trend,
  data = data,
  correlation = corARMA(p = 10, form = ~time),
  method = "ML")

Factual and counterfactual data

d_fact <- data[52, , drop = FALSE]
d_cfac <- transform(d_fact, level = 0)

Factual and counterfactual data

p_fact <- predictions(model_p10, newdata = d_fact)
p_cfac <- predictions(model_p10, newdata = d_cfac)
p_fact
#> 
#>  Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 % year time level
#>      2605       48.1 54.2   <0.001 Inf  2511   2699 1922   52     1
#>  trend
#>     25

p_cfac
#> 
#>  Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 % year time level
#>      3344        154 21.7   <0.001 344.6  3042   3645 1922   52     0
#>  trend
#>     25

Counterfactual comparison

comparisons(model_p10,
  variables = "level",
  newdata = d_fact)
#> 
#>  Estimate Std. Error     z Pr(>|z|)    S 2.5 % 97.5 % year time trend
#>      -738        133 -5.56   <0.001 25.2  -998   -478 1922   52    25

# Equivalent to:
p_fact$estimate - p_cfac$estimate
#> [1] -738.2451

Ratio instead of difference

comparisons(model_p10,
  comparison = "ratio",
  variables = "level",
  newdata = d_fact)
#> 
#>  Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 % year time trend
#>     0.779     0.0301 25.9   <0.001 487.3  0.72  0.838 1922   52    25

# Equivalent to:
p_fact$estimate / p_cfac$estimate
#> [1] 0.7792013

Upvotes: 1

Related Questions