Stata merge command when identifier variables have same information but different column names

Sample data:

Location X
LA 10
San Diego 12
New York 15
Mumbai 9
City Y
LA 8
San Diego 2
New York 5
Mumbai 12

I have these two datasets for instance, I am trying to merge them in Stata which usually requires the identifier variable to have the same column name. My task is to merge this data without manually changing anything in them. How do I merge these to tables to get a table of the columns: Location, X, Y when the identifiers have a different name?

Upvotes: 0

Views: 2211

Answers (2)

langtang
langtang

Reputation: 24845

This is an alternative approach using frames

* create frame d2 and put data in it
frame create d2
frame d2: input str9 city byte y
"LA"         8
"San Diego"  2
"New York"   5
"Mumbai"    12
end

* put data in the default frame
input str9 location byte x
"LA"        10
"San Diego" 12
"New York"  15
"Mumbai"     9
end

* link the frames using frlink; notice that location is linked to city, even
* though the names are different
frlink 1:1 location, frame(d2 city)

* get the variables from the d2 frame using the link id, and then drop the link id
frget city y, from(d2)
drop d2
list,clean

Output:

        location    x    y  
  1.          LA   10    8  
  2.   San Diego   12    2  
  3.    New York   15    5  
  4.      Mumbai    9   12  

Upvotes: 1

TheIceBear
TheIceBear

Reputation: 3255

You can use temporary files that are only stored in memory and not on disk where you can create a variable that you can merge on. This will leave the files with the original data on your disk unchanged while you still are able to merge them.

If this does not solve your question, can you elaborate on the restriction "merge this data without manually changing anything in them".

Example:

* Load data set 1
clear
input str9 location byte x
"LA"        10
"San Diego" 12
"New York"  15
"Mumbai"     9
end

* Generate an identifier without changing original identifier
gen merge_id = location

* Save data in a temp file
tempfile data1
save `data1'

* Load data set 2
clear
input str9 city byte y
"LA"         8
"San Diego"  2
"New York"   5
"Mumbai"    12
end

* Generate an identifier without changing original identifier
gen merge_id = city

* Merge datasets and drop the merge identifier
merge 1:1 merge_id using `data1'
drop merge_id

Upvotes: 1

Related Questions