Reputation: 455
I want to determine the within-, the overall- and the between standard deviation of panel data, using R. I have found this very similar problem Between/within standard deviations in R, but I don't know how to apply the solution to my data.
Let us use the following data as en example:
library(foreign)
Panel <-read.dta("http://dss.princeton.edu/training/Panel101.dta")
giving the following output:
country year y y_bin x1 x2 x3 opinion
1 A 1990 1342787840 1 0.27790365 -1.1079559 0.28255358 Str agree
2 A 1991 -1899660544 0 0.32068470 -0.9487200 0.49253848 Disag
3 A 1992 -11234363 0 0.36346573 -0.7894840 0.70252335 Disag
4 A 1993 2645775360 1 0.24614404 -0.8855330 -0.09439092 Disag
5 A 1994 3008334848 1 0.42462304 -0.7297683 0.94613063 Disag
6 A 1995 3229574144 1 0.47721413 -0.7232460 1.02968037 Str agree
7 A 1996 2756754176 1 0.49980500 -0.7815716 1.09228814 Disag
8 A 1997 2771810560 1 0.05162839 -0.7048455 1.41590083 Str agree
9 A 1998 3397338880 1 0.36641079 -0.6983712 1.54872274 Disag
10 A 1999 39770336 1 0.39584252 -0.6431540 1.79419804 Str disag
11 B 1990 -5934699520 0 -0.08184998 1.4251202 0.02342812 Agree
12 B 1991 -711623744 0 0.10616001 1.6496018 0.26036251 Str agree
13 B 1992 -1933116160 0 0.35378519 1.5937191 -0.23439877 Agree
14 B 1993 3072741632 1 0.72677696 1.6917576 0.25622433 Str disag
15 B 1994 3768078848 1 0.71939486 1.7414261 0.41174951 Disag
16 B 1995 2837581312 1 0.67154658 1.7083139 0.53584301 Str disag
17 B 1996 577199360 1 0.81985730 1.5324961 -0.49964902 Str agree
18 B 1997 1786851584 1 0.88016719 1.5021962 -0.57626772 Disag
19 B 1998 -149072048 0 0.70451611 1.4236463 -0.44841924 Agree
20 B 1999 -1174480128 0 0.23696731 1.4545859 -0.04936399 Str disag
21 C 1990 -1292379264 0 1.31256068 -1.2931356 0.20408297 Agree
22 C 1991 -3415966464 0 1.17748356 -1.3442180 0.28397188 Str agree
23 C 1992 -355804672 0 1.25640798 -1.2599510 0.37339270 Agree
24 C 1993 1225180032 1 1.42154455 -1.3117452 -0.37596563 Disag
25 C 1994 3802287616 1 1.11419308 -1.2849948 0.56046754 Str disag
26 C 1995 1959696640 1 1.15948391 -1.2188276 0.69540799 Agree
27 C 1996 530576672 1 1.16045427 -1.2350063 0.81689382 Agree
28 C 1997 3128852224 1 1.44641161 -1.3275964 -0.14206907 Str disag
29 C 1998 3201045760 1 1.15162671 -1.2061129 1.19458139 Str agree
30 C 1999 4663067648 1 1.19054413 -1.1266172 1.67016041 Disag
31 D 1990 1883025152 1 -0.31391269 1.7366557 0.64663702 Disag
32 D 1991 6037768704 1 0.36009100 2.1318641 1.09994173 Disag
33 D 1992 10244189 1 0.05188770 1.6816775 0.96976823 Str agree
34 D 1993 5067265024 1 0.20944354 1.6149769 -0.21257821 Str agree
35 D 1994 3882478336 1 0.38207000 1.5683011 -1.16538668 Disag
36 D 1995 8827006976 1 0.24208580 1.5412215 -0.18413101 Agree
37 D 1996 5782000128 1 0.48636678 1.7423391 -0.03731453 Str disag
38 D 1997 5090524160 1 0.35942599 1.8742865 0.08786795 Str agree
39 D 1998 1850565248 1 0.23220351 1.5953021 0.07247547 Disag
40 D 1999 -2025476864 0 -0.07998896 1.7047973 0.55843300 Str agree
41 E 1990 1342787840 1 0.45286715 1.7284026 0.59705788 Str disag
42 E 1991 2296009472 1 0.41904032 1.7068400 0.79313534 Str agree
43 E 1992 1737627776 1 0.38521346 1.6852775 0.98921281 Agree
44 E 1993 113973136 1 -0.24428773 1.6492835 1.22413278 Str agree
45 E 1994 260098048 1 1.39113998 2.5302765 -0.52620137 Str disag
46 E 1995 -7863482880 0 0.31968558 1.1890552 -0.48425370 Agree
47 E 1996 3520491520 1 0.61097682 1.4845277 -0.97895509 Agree
48 E 1997 5234565120 1 0.71761495 1.5544620 -0.98863661 Str disag
49 E 1998 344746176 1 0.69613826 1.7010406 -0.08965246 Disag
50 E 1999 243920688 1 0.60662067 1.6119040 -0.08929884 Str disag
51 F 1990 1342787840 1 -0.56757486 -0.3466710 1.25841928 Str agree
52 F 1991 3560401920 1 0.15974578 -0.4641182 0.32665297 Str disag
53 F 1992 3192281088 1 0.88706642 -0.5815655 -0.60511333 Agree
54 F 1993 8941232128 1 0.53241795 -0.7553238 -0.51157588 Agree
55 F 1994 8124504576 1 0.87260014 -0.7114431 0.20570269 Str agree
56 F 1995 491740096 1 0.91935229 -0.3697441 -0.01292755 Str agree
57 F 1996 3497164544 1 1.39689231 -0.3601406 0.67867643 Str agree
58 F 1997 4764803072 1 0.98688608 -0.3590902 0.24226174 Str agree
59 F 1998 -4671723520 0 0.78830910 -0.7556524 0.73347801 Agree
60 F 1999 6349319168 1 0.27938697 -0.4601679 1.17317200 Disag
61 G 1990 1342787840 1 0.94488174 -1.5150151 1.45265734 Str disag
62 G 1991 -1518985728 0 1.09872830 -1.4614717 1.43964469 Agree
63 G 1992 1912769920 1 1.25257492 -1.4079282 1.42663205 Str agree
64 G 1993 1345690240 1 0.76276451 -1.3519315 1.85448635 Str disag
65 G 1994 2793515008 1 1.20645559 -1.3252175 2.23653030 Str disag
66 G 1995 1323696384 1 1.08718646 -1.4098167 2.82980847 Str disag
67 G 1996 254524176 1 0.78107548 -1.3279996 4.27822399 Str agree
68 G 1997 3297033216 1 1.25787950 -1.5773667 4.58732557 Disag
69 G 1998 3011820800 1 1.24277663 -1.6012177 6.11376190 Disag
70 G 1999 3296283392 1 1.23420024 -1.6217614 7.16892195 Disag
The within St.Dev. shall capture the variance within a country over the years. Whereas the between St.Dev. shall capture the variance between countries. The output should therefore be 3 different standard deviations (within, between and overall) for every variable (here: x1, x2, x3). PS: I am also using the plm and the reshape2 package.
EDIT: In the second step I am calculating the mean for every country by
Panel_mean <- Panel %>% group_by(country) %>% summarise(mean(x1), mean(x2), mean(x3))
Getting the variance for the in between countries by:
Panel %>% group_by(country) %>% summarise_each(funs(mean), x1, x2, x3) %>%
summarise_each(funs(var), x1, x2, x3)
and the variance for the in between years by:
Panel %>% group_by(year) %>% summarise_each(funs(mean), x1, x2, x3) %>%
summarise_each(funs(var), x1, x2, x3)
EDIT 2: Because it was asked, here are my next steps: I want to determine country-specific regressors to plot unconditional correlations between y and each of these regressors. I want to get 3 "groups" of plots for each variable: 1. Overall correlation 2. deviations of y and regressors from their country means (within variance) 3. the correlation of the district means of variables (between variance)
Here is an example of the output desired:
For the overall correlation I guess I could simply use a lm (instead of the plm used for the panel data analysis), as in:
plot(x1, y)
abline(lm(y~x1)
Or am I completely on the wrong track?
Upvotes: 1
Views: 5595
Reputation: 9618
You can do a lot of calculations with these results, the question is, is it useful for your purpose? What is the objective of your analysis, what question do you want answer with it?
Upvotes: 1
Reputation: 9618
You can do this using the dplyr
:
# The within-country variance:
df %>% group_by(country) %>% summarise(var(x1), var(x2), var(x3))
Source: local data frame [7 x 4]
country var(x1) var(x2) var(x3)
1 A 0.01689254 0.019945743 0.3459071
2 B 0.11111015 0.014658133 0.1578417
3 C 0.01376573 0.004341126 0.3684358
4 D 0.05922682 0.030828768 0.4438790
5 E 0.16660745 0.114101310 0.6562002
6 F 0.30408784 0.029109927 0.3974615
7 G 0.03731913 0.012823557 4.3677278
# The within-year variance:
df %>% group_by(year) %>% summarise(var(x1), var(x2), var(x3))
Source: local data frame [10 x 4]
year var(x1) var(x2) var(x3)
1 1990 0.4565977 2.215550 0.2904437
2 1991 0.1906246 2.501216 0.2097600
3 1992 0.2307872 2.103001 0.5223656
4 1993 0.2783625 2.172129 0.8009998
5 1994 0.1505808 2.647259 1.1734290
6 1995 0.1356406 1.794507 1.2216286
7 1996 0.1179536 1.909766 2.9574045
8 1997 0.2380631 2.155005 3.5644637
9 1998 0.1375272 2.085431 5.0101764
10 1999 0.2455796 2.004060 6.2910426
# And the overall variance:
apply(df[5:7], 2, var)
x1 x2 x3
0.2190896 1.8799138 2.0918771
Upvotes: 2