Ptru
Ptru

Reputation: 55

How to correlate within countries over year?

I'm trying to find correlation between GDP across countries using stata. I'm using the data available from Penn World Tables 8.1 (http://www.rug.nl/research/ggdc/data/pwt/v81/pwt81.zip). It is a massive data table with lots of macro statistics, but essentially I'm interested in the variables country, year and rgdpna (GDP).

I've been trying to create new variables for each country I'm interested in and trying to use pwcorr to correlate these. However this method generates alot of missing variables and gives no correlation. My code is:

/*We generate variables for countries*/

/*We use Sweden as reference point and find 5 near-by countries.
The chosen countries are Sweden, Norway, Finland, Germany, Denmark*/
gen swe = rgdpna if country == "Sweden" & year >= 1997
gen nor = rgdpna if country == "Norway" & year >= 1997
gen fin = rgdpna if country == "Finland" & year >= 1997
gen ger = rgdpna if country == "Germany" & year >= 1997
gen den = rgdpna if country == "Denmark" & year >= 1997

/*Then we choose 5 far-away countries. The chosen countries are
Canada, China, Japan, Russia, US*/
gen can = rgdpna if country == "Canada" & year >= 1997
gen usa = rgdpna if country == "United States" & year >= 1997
gen rus = rgdpna if country == "Russian Federation" & year >= 1997
gen chn = rgdpna if country == "China, People's Republic of" & year >= 1997
gen jap = rgdpna if country == "Japan" & year >= 1997

/*pwcorr the variables*/
pwcorr swe nor fin ger den can usa rus chn

This gives the following result:

             |      swe      nor      fin      ger      den      can      usa
-------------+---------------------------------------------------------------
         swe |   1.0000 
         nor |        .   1.0000 
         fin |        .        .   1.0000 
         ger |        .        .        .   1.0000 
         den |        .        .        .        .   1.0000 
         can |        .        .        .        .        .   1.0000 
         usa |        .        .        .        .        .        .   1.0000 
         rus |        .        .        .        .        .        .        . 
         chn |        .        .        .        .        .        .        . 

             |      rus      chn
-------------+------------------
         rus |   1.0000 
         chn |        .   1.0000 

Does anybody know how to fix this?

Upvotes: 1

Views: 671

Answers (1)

Nick Cox
Nick Cox

Reputation: 37183

You have a panel data structure, so different countries are in different observations. Hence you should not be surprised that results are missing unless you compare a country with itself. You need to reshape first, something like this.

keep if year >= 1997
local c1 inlist(country, "Sweden", "Norway", "Finland", "Germany", "Denmark") 
local c2 inlist(country, "Canada", "United States", "Russian Federation", "China, People's Republic of", "Japan") 
keep if `c1' | `c2' 
separate rgdpna, by(country) veryshortlabel 
drop rgdpna country 
reshape wide rgdpna, i(year) j(country) 
pwcorr rgdpna* 

Upvotes: 2

Related Questions