Reputation: 3
I have two datasets.
One containing most of my data:
ISIN Name Year TotalAssets Turnover RoA Employees
1. Nr1 Bank1 2005 30.000 1 0,5 300
2. Nr1 Bank1 2006 31.000 1 0,3 300
3. Nr1 Bank1 2007 28.000 1 0,3 280
4. Nr2 Bank2 2005 150.000 0 0,8 2000
5. Nr2 Bank2 2006 - 0 - 2200
6. Nr2 Bank2 2007 - 0 - 2200
7. Nr3 Bank3 2005 20.000 1 0,7 275
And another containing some missing information from the first dataset:
ISIN Name Year TotalAssets RoA
1. Nr2 Bank2 2005 150.000 0,8
2. Nr2 Bank2 2006 180.000 1,3
3. Nr2 Bank2 2007 170.000 0,7
4. Nr2 Bank2 2008 150.000 0,4
When I combine these datasets using:
merge 1:1 ISIN Year using "Dataset2"
for some reason, only the master only (1)
and using only (2)
match the variables TotalAssets
and RoA
, while matched(3)
gives missing data for every case on the TotalAssets
and RoA
variable.
The merged dataset looks something like this:
ISIN Name Year TotalAssets Turnover RoA Employees _merge
1. Nr1 Bank1 2005 30.000 1 0,5 300 master only (1)
2. Nr1 Bank1 2006 31.000 1 0,3 300 master only (1)
3. Nr1 Bank1 2007 28.000 1 0,3 280 master only (1)
4. Nr2 Bank2 2005 150.000 0 0,8 2000 master only (1)
5. Nr2 Bank2 2006 - 0 - 2200 matched(3)
6. Nr2 Bank2 2007 - 0 - 2200 matched(3)
7. Nr2 Bank2 2008 150.000 - 0,4 - using only (2)
8. Nr3 Bank3 2005 20.000 1 0,7 275 master only (1)
However the matched
don't show any data for the variables on which it's supposed to merge
.
Am I doing something wrong or how would I go about doing it in a better way?
Upvotes: 0
Views: 1594
Reputation:
What you need in this particular case is the update
option of merge
, not replace
.
Using your example:
clear
input str10(ISIN) str10(Name) Year TotalAssets Turnover RoA Employees
Nr1 Bank1 2005 30000 1 0.5 300
Nr1 Bank1 2006 31000 1 0.3 300
Nr1 Bank1 2007 28000 1 0.3 280
Nr2 Bank2 2005 150000 0 0.8 2000
Nr2 Bank2 2006 . 0 . 2200
Nr2 Bank2 2007 . 0 . 2200
Nr3 Bank3 2005 20000 1 0.7 275
end
save data1, replace
clear
input str10(ISIN) str10(Name) Year TotalAssets RoA
Nr2 Bank2 2005 150000 0.8
Nr2 Bank2 2006 180000 1.3
Nr2 Bank2 2007 170000 0.7
Nr2 Bank2 2008 150000 0.4
end
save data2, replace
use data1,clear
merge 1:1 ISIN Year using data2, update
Which gives you:
list
+----------------------------------------------------------------------------------+
| ISIN Name Year TotalA~s Turnover RoA Employ~s _merge |
|----------------------------------------------------------------------------------|
1. | Nr1 Bank1 2005 30000 1 .5 300 master only (1) |
2. | Nr1 Bank1 2006 31000 1 .3 300 master only (1) |
3. | Nr1 Bank1 2007 28000 1 .3 280 master only (1) |
4. | Nr2 Bank2 2005 150000 0 .8 2000 matched (3) |
5. | Nr2 Bank2 2006 180000 0 1.3 2200 missing updated (4) |
|----------------------------------------------------------------------------------|
6. | Nr2 Bank2 2007 170000 0 .7 2200 missing updated (4) |
7. | Nr2 Bank2 2008 150000 . .4 . using only (2) |
8. | Nr3 Bank3 2005 20000 1 .7 275 master only (1) |
+----------------------------------------------------------------------------------+
There are two missing values in observation 7 because the variables Turnover
and Employees
are not present in the second dataset.
Upvotes: 1
Reputation:
Look into the options for help merge
, particularly update
and replace
. In your specific case, it looks like the update
option would be sufficient (_merge==4
). Stata will view the master
dataset as inviolable unless you explicitly give it instructions on how to deal with same-named variables in the using
dataset.
Upvotes: 1