Reputation: 33
I have a dataset that looks like this:
country1 | country2 | group |
---|---|---|
China | Philippines | 68a |
China | Thailand | 68a |
Bahamas | Jamaica | 176a |
Bahamas | Grenada | 176a |
I need to transform the above dataset into like this:
country1 | country2 | group |
---|---|---|
China | Philippines | 68a |
China | Thailand | 68a |
Philippines | China | 68a |
Philippines | Thailand | 68a |
Thailand | China | 68a |
Thailand | Philippines | 68a |
Bahamas | Jamaica | 176a |
Bahamas | Grenada | 176a |
Jamaica | Bahamas | 176a |
Jamaica | Grenada | 176a |
Grenada | Bahamas | 176a |
Grenada | Jamaica | 176a |
I tried my best to follow the Stata code in this article: https://www.stata.com/support/faqs/data-management/expanding-datasets-to-all-pairs/. However, I ended up with a dataset that looks like this:
country1 | country2 | group |
---|---|---|
China | Philippines | 68a |
China | Philippines | 68a |
China | Thailand | 68a |
China | Thailand | 68a |
Bahamas | Jamaica | 176a |
Bahamas | Jamaica | 176a |
Bahamas | Grenada | 176a |
Bahamas | Grenada | 176a |
I'm not sure what I'm doing wrong.
Upvotes: 1
Views: 611
Reputation: 33
I was able to generate my desired dataset with my codes below but I'm hoping to find a more straightforward way to code it.
use "original dataset", clear
drop country2
save "temp_country1", replace
use "original dataset", clear
drop country1
ren country2 country1
append using "temp_country1"
//drop duplicates//
sort number country1
quietly by number country1: gen dup = cond(_N==1,0,_n)
drop if dup>1
drop dup
save "temp_country1_final", replace
use "temp_country1_final", clear
ren country1 country2
save "temp_country2.dta", replace
use "temp_country1_final", clear
joinby number using "temp_country2.dta"
order country1 country2 number name
drop if country1==country2
Upvotes: 0
Reputation: 973
I think the issue is because your unique identifier is really the combination of two columns (country1
and country2
) whereas in the example you are following there is a unique id
column. Here is how I would do this with your example, provided your dataset isn't incredibly large:
clear
input str40(country1 country2 group)
"China" "Philippines" "68a"
"China" "Thailand" "68a"
"Bahamas" "Jamaica" "176a"
"Bahamas" "Grenada" "176a"
end
egen pair_id = group(country1 country2) // Create unique pair id
reshape long country, i(group pair_id) j(j ) // reshape all countries long
drop pair_id j
rename country country1
* create duplicate dataset to fulljoin
preserve
rename country country2
keep country2 group
tempfile cross
save `cross', replace
restore
joinby group using `cross' // full join
drop if country1 == country2
* Some tidying to match example output
order country1 country2 group
gsort -group country1 country2
duplicates drop
Upvotes: 2