Gilles Cosyn
Gilles Cosyn

Reputation: 455

Between and within standard deviation of panel data

I want to determine the within-, the overall- and the between standard deviation of panel data, using R. I have found this very similar problem Between/within standard deviations in R, but I don't know how to apply the solution to my data.

Let us use the following data as en example:

library(foreign)
Panel <-read.dta("http://dss.princeton.edu/training/Panel101.dta")

giving the following output:

   country year           y y_bin          x1         x2          x3   opinion
1        A 1990  1342787840     1  0.27790365 -1.1079559  0.28255358 Str agree
2        A 1991 -1899660544     0  0.32068470 -0.9487200  0.49253848     Disag
3        A 1992   -11234363     0  0.36346573 -0.7894840  0.70252335     Disag
4        A 1993  2645775360     1  0.24614404 -0.8855330 -0.09439092     Disag
5        A 1994  3008334848     1  0.42462304 -0.7297683  0.94613063     Disag
6        A 1995  3229574144     1  0.47721413 -0.7232460  1.02968037 Str agree
7        A 1996  2756754176     1  0.49980500 -0.7815716  1.09228814     Disag
8        A 1997  2771810560     1  0.05162839 -0.7048455  1.41590083 Str agree
9        A 1998  3397338880     1  0.36641079 -0.6983712  1.54872274     Disag
10       A 1999    39770336     1  0.39584252 -0.6431540  1.79419804 Str disag
11       B 1990 -5934699520     0 -0.08184998  1.4251202  0.02342812     Agree
12       B 1991  -711623744     0  0.10616001  1.6496018  0.26036251 Str agree
13       B 1992 -1933116160     0  0.35378519  1.5937191 -0.23439877     Agree
14       B 1993  3072741632     1  0.72677696  1.6917576  0.25622433 Str disag
15       B 1994  3768078848     1  0.71939486  1.7414261  0.41174951     Disag
16       B 1995  2837581312     1  0.67154658  1.7083139  0.53584301 Str disag
17       B 1996   577199360     1  0.81985730  1.5324961 -0.49964902 Str agree
18       B 1997  1786851584     1  0.88016719  1.5021962 -0.57626772     Disag
19       B 1998  -149072048     0  0.70451611  1.4236463 -0.44841924     Agree
20       B 1999 -1174480128     0  0.23696731  1.4545859 -0.04936399 Str disag
21       C 1990 -1292379264     0  1.31256068 -1.2931356  0.20408297     Agree
22       C 1991 -3415966464     0  1.17748356 -1.3442180  0.28397188 Str agree
23       C 1992  -355804672     0  1.25640798 -1.2599510  0.37339270     Agree
24       C 1993  1225180032     1  1.42154455 -1.3117452 -0.37596563     Disag
25       C 1994  3802287616     1  1.11419308 -1.2849948  0.56046754 Str disag
26       C 1995  1959696640     1  1.15948391 -1.2188276  0.69540799     Agree
27       C 1996   530576672     1  1.16045427 -1.2350063  0.81689382     Agree
28       C 1997  3128852224     1  1.44641161 -1.3275964 -0.14206907 Str disag
29       C 1998  3201045760     1  1.15162671 -1.2061129  1.19458139 Str agree
30       C 1999  4663067648     1  1.19054413 -1.1266172  1.67016041     Disag
31       D 1990  1883025152     1 -0.31391269  1.7366557  0.64663702     Disag
32       D 1991  6037768704     1  0.36009100  2.1318641  1.09994173     Disag
33       D 1992    10244189     1  0.05188770  1.6816775  0.96976823 Str agree
34       D 1993  5067265024     1  0.20944354  1.6149769 -0.21257821 Str agree
35       D 1994  3882478336     1  0.38207000  1.5683011 -1.16538668     Disag
36       D 1995  8827006976     1  0.24208580  1.5412215 -0.18413101     Agree
37       D 1996  5782000128     1  0.48636678  1.7423391 -0.03731453 Str disag
38       D 1997  5090524160     1  0.35942599  1.8742865  0.08786795 Str agree
39       D 1998  1850565248     1  0.23220351  1.5953021  0.07247547     Disag
40       D 1999 -2025476864     0 -0.07998896  1.7047973  0.55843300 Str agree
41       E 1990  1342787840     1  0.45286715  1.7284026  0.59705788 Str disag
42       E 1991  2296009472     1  0.41904032  1.7068400  0.79313534 Str agree
43       E 1992  1737627776     1  0.38521346  1.6852775  0.98921281     Agree
44       E 1993   113973136     1 -0.24428773  1.6492835  1.22413278 Str agree
45       E 1994   260098048     1  1.39113998  2.5302765 -0.52620137 Str disag
46       E 1995 -7863482880     0  0.31968558  1.1890552 -0.48425370     Agree
47       E 1996  3520491520     1  0.61097682  1.4845277 -0.97895509     Agree
48       E 1997  5234565120     1  0.71761495  1.5544620 -0.98863661 Str disag
49       E 1998   344746176     1  0.69613826  1.7010406 -0.08965246     Disag
50       E 1999   243920688     1  0.60662067  1.6119040 -0.08929884 Str disag
51       F 1990  1342787840     1 -0.56757486 -0.3466710  1.25841928 Str agree
52       F 1991  3560401920     1  0.15974578 -0.4641182  0.32665297 Str disag
53       F 1992  3192281088     1  0.88706642 -0.5815655 -0.60511333     Agree
54       F 1993  8941232128     1  0.53241795 -0.7553238 -0.51157588     Agree
55       F 1994  8124504576     1  0.87260014 -0.7114431  0.20570269 Str agree
56       F 1995   491740096     1  0.91935229 -0.3697441 -0.01292755 Str agree
57       F 1996  3497164544     1  1.39689231 -0.3601406  0.67867643 Str agree
58       F 1997  4764803072     1  0.98688608 -0.3590902  0.24226174 Str agree
59       F 1998 -4671723520     0  0.78830910 -0.7556524  0.73347801     Agree
60       F 1999  6349319168     1  0.27938697 -0.4601679  1.17317200     Disag
61       G 1990  1342787840     1  0.94488174 -1.5150151  1.45265734 Str disag
62       G 1991 -1518985728     0  1.09872830 -1.4614717  1.43964469     Agree
63       G 1992  1912769920     1  1.25257492 -1.4079282  1.42663205 Str agree
64       G 1993  1345690240     1  0.76276451 -1.3519315  1.85448635 Str disag
65       G 1994  2793515008     1  1.20645559 -1.3252175  2.23653030 Str disag
66       G 1995  1323696384     1  1.08718646 -1.4098167  2.82980847 Str disag
67       G 1996   254524176     1  0.78107548 -1.3279996  4.27822399 Str agree
68       G 1997  3297033216     1  1.25787950 -1.5773667  4.58732557     Disag
69       G 1998  3011820800     1  1.24277663 -1.6012177  6.11376190     Disag
70       G 1999  3296283392     1  1.23420024 -1.6217614  7.16892195     Disag

The within St.Dev. shall capture the variance within a country over the years. Whereas the between St.Dev. shall capture the variance between countries. The output should therefore be 3 different standard deviations (within, between and overall) for every variable (here: x1, x2, x3). PS: I am also using the plm and the reshape2 package.

EDIT: In the second step I am calculating the mean for every country by

Panel_mean <- Panel %>% group_by(country) %>% summarise(mean(x1), mean(x2), mean(x3))

Getting the variance for the in between countries by:

Panel %>% group_by(country) %>% summarise_each(funs(mean), x1, x2, x3) %>% 
summarise_each(funs(var), x1, x2, x3)

and the variance for the in between years by:

Panel %>% group_by(year) %>% summarise_each(funs(mean), x1, x2, x3) %>% 
summarise_each(funs(var), x1, x2, x3)

EDIT 2: Because it was asked, here are my next steps: I want to determine country-specific regressors to plot unconditional correlations between y and each of these regressors. I want to get 3 "groups" of plots for each variable: 1. Overall correlation 2. deviations of y and regressors from their country means (within variance) 3. the correlation of the district means of variables (between variance)

Here is an example of the output desired: enter image description here

For the overall correlation I guess I could simply use a lm (instead of the plm used for the panel data analysis), as in:

plot(x1, y)
abline(lm(y~x1)

Or am I completely on the wrong track?

Upvotes: 1

Views: 5595

Answers (2)

DatamineR
DatamineR

Reputation: 9618

You can do a lot of calculations with these results, the question is, is it useful for your purpose? What is the objective of your analysis, what question do you want answer with it?

Upvotes: 1

DatamineR
DatamineR

Reputation: 9618

You can do this using the dplyr:

# The within-country variance:
df %>% group_by(country) %>% summarise(var(x1), var(x2), var(x3))
Source: local data frame [7 x 4]

  country    var(x1)     var(x2)   var(x3)
1       A 0.01689254 0.019945743 0.3459071
2       B 0.11111015 0.014658133 0.1578417
3       C 0.01376573 0.004341126 0.3684358
4       D 0.05922682 0.030828768 0.4438790
5       E 0.16660745 0.114101310 0.6562002
6       F 0.30408784 0.029109927 0.3974615
7       G 0.03731913 0.012823557 4.3677278

# The within-year variance:
df %>% group_by(year) %>% summarise(var(x1), var(x2), var(x3))
Source: local data frame [10 x 4]

   year   var(x1)  var(x2)   var(x3)
1  1990 0.4565977 2.215550 0.2904437
2  1991 0.1906246 2.501216 0.2097600
3  1992 0.2307872 2.103001 0.5223656
4  1993 0.2783625 2.172129 0.8009998
5  1994 0.1505808 2.647259 1.1734290
6  1995 0.1356406 1.794507 1.2216286
7  1996 0.1179536 1.909766 2.9574045
8  1997 0.2380631 2.155005 3.5644637
9  1998 0.1375272 2.085431 5.0101764
10 1999 0.2455796 2.004060 6.2910426

# And the overall variance:

 apply(df[5:7], 2, var)
       x1        x2        x3 
0.2190896 1.8799138 2.0918771 

Upvotes: 2

Related Questions