Reputation: 55
I'm trying to find correlation between GDP across countries using stata. I'm using the data available from Penn World Tables 8.1 (http://www.rug.nl/research/ggdc/data/pwt/v81/pwt81.zip). It is a massive data table with lots of macro statistics, but essentially I'm interested in the variables country, year and rgdpna (GDP).
I've been trying to create new variables for each country I'm interested in and trying to use pwcorr to correlate these. However this method generates alot of missing variables and gives no correlation. My code is:
/*We generate variables for countries*/
/*We use Sweden as reference point and find 5 near-by countries.
The chosen countries are Sweden, Norway, Finland, Germany, Denmark*/
gen swe = rgdpna if country == "Sweden" & year >= 1997
gen nor = rgdpna if country == "Norway" & year >= 1997
gen fin = rgdpna if country == "Finland" & year >= 1997
gen ger = rgdpna if country == "Germany" & year >= 1997
gen den = rgdpna if country == "Denmark" & year >= 1997
/*Then we choose 5 far-away countries. The chosen countries are
Canada, China, Japan, Russia, US*/
gen can = rgdpna if country == "Canada" & year >= 1997
gen usa = rgdpna if country == "United States" & year >= 1997
gen rus = rgdpna if country == "Russian Federation" & year >= 1997
gen chn = rgdpna if country == "China, People's Republic of" & year >= 1997
gen jap = rgdpna if country == "Japan" & year >= 1997
/*pwcorr the variables*/
pwcorr swe nor fin ger den can usa rus chn
This gives the following result:
| swe nor fin ger den can usa
-------------+---------------------------------------------------------------
swe | 1.0000
nor | . 1.0000
fin | . . 1.0000
ger | . . . 1.0000
den | . . . . 1.0000
can | . . . . . 1.0000
usa | . . . . . . 1.0000
rus | . . . . . . .
chn | . . . . . . .
| rus chn
-------------+------------------
rus | 1.0000
chn | . 1.0000
Does anybody know how to fix this?
Upvotes: 1
Views: 671
Reputation: 37183
You have a panel data structure, so different countries are in different observations. Hence you should not be surprised that results are missing unless you compare a country with itself. You need to reshape
first, something like this.
keep if year >= 1997
local c1 inlist(country, "Sweden", "Norway", "Finland", "Germany", "Denmark")
local c2 inlist(country, "Canada", "United States", "Russian Federation", "China, People's Republic of", "Japan")
keep if `c1' | `c2'
separate rgdpna, by(country) veryshortlabel
drop rgdpna country
reshape wide rgdpna, i(year) j(country)
pwcorr rgdpna*
Upvotes: 2