Reputation: 2780
Currently I only report means for calculations which I show below, but I would like to add confidence intervals.
If I have the data in the correct format it would not be had for me to use linear regressionlm()
to calculate estimated grouped differences and their intervals, but I am having difficulty getting the data in the correct format.
Here is some data:
> set.seed(909)
> d2017pre <- tibble(n = rnorm(25, mean = 1100, sd = 10),period = "pre", year = 2017)
> d2016pre <- tibble(n = rnorm(25, mean = 1500, sd = 10),period = "pre", year = 2016)
> d2017post <- tibble(n = rnorm(25, mean = 1000, sd = 10),period = "post", year = 2017)
> d2016post <- tibble(n = rnorm(25, mean = 900, sd = 10),period = "post", year = 2016)
> df <- bind_rows(d2017pre,d2016pre,d2017post,d2016post)
> df %>% group_by(year,period) %>% summarise(mean(n))
# A tibble: 4 x 3
# Groups: year [?]
year period `mean(n)`
<dbl> <chr> <dbl>
1 2016 post 899
2 2016 pre 1498
3 2017 post 999
4 2017 pre 1104
These are the three calculations I routinely do.
> # pre - post 2016
> pp16 <- 1498 - 899
> pp16
[1] 599
>
> # pre - post 2017
> pp17 <-1100 - 999
> pp17
[1] 101
>
> # net of control: pp2016 - pp2017
> noc <- pp16 - pp17
> noc
[1] 498
The questions this answers is:
What was the difference between the pre
and post
period in 2016
or 2017
Was 2017
s pre/post difference greater than 2016
s pre/post difference.
I would like to answer these questions not just with estimates but also with confidence intervals. As mentioned above, I am planing on using lm()
to get the confidence intervals of differences, but I am having difficulty getting the data in the correct format.
I believe that this will require two data sets. One for the difference of the periods in the year and one for the differences of the differences (net of control). This leads to the following questions.
How can I calculated the differences of n
grouped by period
and year
?
How can I calculate the differences of differences?
Upvotes: 0
Views: 45
Reputation: 5893
First, you can get the differences using another group_by
.
diffs <- df %>%
group_by(year, period) %>%
summarise(mean = mean(n)) %>%
group_by(year) %>%
summarise(diff = diff(mean))
# A tibble: 2 x 2
year diff
<dbl> <dbl>
1 2016 599
2 2017 105
The difference of the differences is similar, then (bad namespace maybe..)
diff(rev(diffs$diff))
[1] 493.8846
For the regression, you actually do not need to alter your dataframe - the data is needed to calculate the effects. I think (but not sure if I understand correctly) you are looking for a model with interaction effect?
E.g.,
m1 <- lm(n ~ period + factor(year) + period*factor(year), data = df)
summary(m1)
Note how the interaction effect is basically that difference
Upvotes: 1